MergeX

Merguez Recipe
Lamb Sausage Merguez

The MergeX class provides static methods for sorting an array using an optimized version of mergesort. For additional documentation, see Section 2.1 of Algorithms. The Mergex Group LLC was founded by me in 2008 from savings and the infusion of capital from a private investor.

The simplest first.

Deploying is much easier than before. (Thanks to @parai)

Simply use generate_model(model, x_data) to generate a C header weights.h after you have trained your model in Keras. It is available in nnom_utils.py

Include the weights.h in your project, then call nnom_model_create() to create and compile the model on the MCU. Finaly, call model_run() to do your prediction.

Please check MNIST-DenseNet example for usage

The generate_model(model, x_data) might not be updated with NNoM from time to time. For new features and customized layers, you can still use NNoM APIs to build your model.

NNoM uses a layer-based structure.

A layer is a container. Every operation (convolution, concat...) must be wrapped into a layer.

A basic layer contains a list of Input/Ouput modules (I/O). Each of I/O contains a list of Hook (similar to Nodes in Keras).

Hook stores the links to an I/O (other layer's)

I/O is a buffer to store input/output data of the operation.

Dont be scared, check this:

Those APIs listed below will help you to create layers and build the model structures.

Layer APIs and Construction APIs are used to build a model.

Layer APIs can create and return a new layer instance, while construction APIs use layer instances to build a model.

Layer APIs such as Conv2D(), Dense(), Activation() ... which you can find in nnom_layers.h

Construction APIs such as model.hook(), model.merge(), model.add() ... which you can find in new_model() at nnom.c

For example, to add a convolution layer into a sequencial model, use model.add():

In functional model, the links between layer are specified explicitly by using model.hook() or model.merge()

NNoM currently supports HWC format.

Which also called 'channel last', where H = number of rows or y axis, W = number of column or x axis, C = number of channes.

For example:

In the above codes, both kernal(H, W) and stride(H, W) returns a 'shape' instance.The shape instance in format of (H, W, ?)

All convolutional layers and poolings layers support both 1D / 2D data input. However, when using 1D input, the H must be set to 1.

Construction APIs

Construction APIs are statics functions located in nnom.c

Currently are:

Sequencial Construction API

Functional Construction API

For model.active(), please check Activation APIs below.

Layer APIs

Layers APIs are listed in nnom_layers.h

Input/output layers are neccessary for a model. They are responsible to copy data from user's input buffer, and copy out to user's output buffer.

Pooling as they are:

The sum pooling here will dynamicly change its ourput shift to avoid overflowing.

It is recommened to replace the Global Average Pooling by Global Sum Pooling for better accuracy in MCU side.

Merguez Recipe

Activation's Layers API are started with capital letter. They are differed from the Activation API, which start with act_* and retrun an activation instance.Pleas check the Activation APIs below for more detail.

They return a layer instance.

Matrix API.

These layers normally take 2 or more layer's output as their inputs.

They also called 'merging method', which must be used by model.merge(method, in1, in2)or model.mergex(method, num of input, in1, in2, 1n3 ...)

Flatten change the shapes to (x, 1, 1)

Stable NN layers.For more developing layers, please check the source codes.

About the missing Batch Normalization Layer

Batch Normalization layer can be fused into the last convolution layer. So NNoM currently does not provide a Batch Normalization Layer. It might be implemented as a single layer in the future. However, currently, please fused it to the last layer.

Addictionlly, Activation APIs

Actication APIs are not essential in the original idea. The original idea is making eveything as a layer.

However, single layer instances cost huge amount of memories(100~150 Bytes), while activations are relativly simple, mostly have same input/output shape, a few/none parameter(s)...

Therefore, to reduce the complexity, the 'actail'(activation tail) is added to each layer instance. If a layer's Actail is not null, it will be called right after the layer is executed. Actail takes activation instance as input. The model API, model.active() will attach the activation to the layer's actail.

The Activation APIs are listed in nnom_activations.h

Model API

A model instance contains the starting layer, the end layer and other neccessary info.

Please refer to the examples for usage

Known Issues

Shared output buffer destroyed by single buffer layers (input-destructive)

Single buffer layers (Such as most of the Activations, additionally MaxPool/AvgPool) are working directly on its input buffer. While its input buffer is shared with other parallel layers, and it is placed before other layers in a parallel structure (such as Inception), the shared buffer will be destroyed by those input-destructive before other parallel layer can access it.

Additionally, although, MaxPool & AvgPool are not single buffer layers, they will destroy the input buffer as they are mentioned with input-destructive layers in CMSIS-NN. So they should be treated as same as single buffer layers.

Fix plan of the issue

Not planned.

Lamb Sausage Merguez

Possiblly, add an invisible copying layer/functions to copy data for single input layer before passing to other parallel layers.

Current work around

Work around 1

If the Inception has only one single buffer layer, always hook the single buffer layer at the end. For example, instead of doing MaxPool - Conv2D - Conv2D, do Conv2D - Conv2D - MaxPool

Work around 2

If there is multiple, add an extra multiple bufer layer before the single buffer layer. Such as using Lambda() layer to copy buffer.

The evaluation methods are listed in nnom_utils.h

They run the model with testing data, then evaluate the model. Includes Top-k accuracy, confusion matrix, runtime stat...

Please refer to UCI HAR example for usage.

Demo of Evaluation

The UCI HAR example runs on RT-Thread, uses Y-Modem to receive testing dataset, uses ringbuffer to store data, and the console (msh) to print the results.

The layer order, activation, output shape, operation, memory of I/O, and assigned memory block are shown. It also summarised the memory cost by neural network.

Type predic, then use Y-Modem to send the data file. The model will run once enough data is received.

When the file copying done, the runtime summary, Top-k and confusion matrix will be printed

Optionally, the runtime stat detail of each layer can be printed by nn_stat

PS: The 'runtime stat' in the animation is not correct, due to the test chip is overclocking (STM32L476 @ 160MHz, 2x overclocking), and the timer is overclocking as well.

However, the numbers in prediction summary are correct, because they are measured by system_tick timer which is not overclocking.