Train, Convert, Run MobileNet on Sipeed MaixPy and MaixDuino !


Today we introduce how to Train, Convert, Run MobileNet model on Sipeed Maix board, with easy use MaixPy and MaixDuino~

Prepare environment

install Keras

We choose Keras as it is really easy to use.
First you should install TF and Keras environment, we recommended use tensorflow docker
docker pull tensorflow/tensorflow:1.13.1-gpu-py3-jupyter

for developer who have poor network speed, you can download Keras pre-trained mobilenet v1 model manually: donwload and then put it into ~/.keras/models/

We suggest use mobilenet_7_5_224_tf_no_top.h5, it means 0.75x channel count, image input size is 224x244, and remove top softmax,drop layer.
Why we choose 0.75x? look at this table:

1.0x have 4.2M parameters, while 0.75x only have 2.6M.
When we use 8bit quantization, it is 4.2MB vs 2.6MB (>40% diff), but the accuracy loss is just 2%.
For MaixDuino, it is ok for both 1.0x and 0.75x, but in MaixPy, it is only suit for 0.75x (micropython environment cost too much ram).

download the dataset

We are going to train classics 1000-classes classifier with mobilenet, let’s download imagenet2012 dataset.
It is about 200GB size, make sure your disk is free enough.
We baiscly use ILSVRC2012_img_train.tar.
Untar it, you will find 1000 tars inside, untar it again use the script:

for x in `ls $dir/*tar`
	filename=`basename $x .tar`
	mkdir $dir/$filename	
	tar -xvf $x -C $dir/$filename

rm *.tar

You get the imagenet dataset ready now~

Build the MobileNet model for Maix

adjust original

The original mobilenet v1 model is locate at:

MAIX’s chip K210 use different padding method with Keras’s default padding method, so we need adjust it.
K210 use the padding method that padding zeros all around (left, up, right, down), but Keras default pad right and down.
We use ZeroPadding2D function to set the padding method for Keras:
x = layers.ZeroPadding2D(padding=((1, 1), (1, 1)), name='conv1_pad')(inputs)
add this line before every conv layer which using stride.

The modified code can be download from our repo:
Replace original in Keras (don’t forget backup it).

build own train script

Let’s create new train script to train mobilenet model.
First we should finish our model, last step we get the mobilenet model without “top”, let’s add the top:

base_model=keras.applications.mobilenet.MobileNet(input_shape=(224, 224, 3), alpha = 0.75,depth_multiplier = 1, dropout = 0.001, pooling='avg',include_top = False, weights = "imagenet", classes = 1000)
x = Dropout(0.001, name='dropout')(x)  

We add dropout and one dense layer to get the prediction label.
And we need fixed the previous layers’ weight to save training time:

for i,layer in enumerate(model.layers):

for layer in model.layers[:86]:
for layer in model.layers[86:]:

The whole script can be download from our github (

train it!

It is a long time to train model, especially >100GB dataset.
If you have muti-GPU, use this to accelerate:
paralleled_model=multi_gpu_model(model, gpus=2)

It takes about 4 hours in my dual 1080Ti machine: (I save about every 10~15mins):

Epoch 1/20
50/50 [==============================] - 697s 14s/step - loss: 6.0500 - acc: 0.0666 
Epoch 2/20
50/50 [==============================] - 688s 14s/step - loss: 4.1333 - acc: 0.2665
Epoch 3/20
50/50 [==============================] - 696s 14s/step - loss: 3.2263 - acc: 0.3815
Epoch 4/20
50/50 [==============================] - 706s 14s/step - loss: 2.7671 - acc: 0.4442
Epoch 5/20
50/50 [==============================] - 709s 14s/step - loss: 2.5103 - acc: 0.4743
Epoch 6/20
50/50 [==============================] - 708s 14s/step - loss: 2.3257 - acc: 0.4968
Epoch 7/20
50/50 [==============================] - 712s 14s/step - loss: 2.1976 - acc: 0.5190
Epoch 8/20
50/50 [==============================] - 712s 14s/step - loss: 2.0934 - acc: 0.5346
Epoch 9/20
50/50 [==============================] - 721s 14s/step - loss: 2.0263 - acc: 0.5463
Epoch 10/20
50/50 [==============================] - 965s 19s/step - loss: 1.9472 - acc: 0.5575
Epoch 11/20
50/50 [==============================] - 1235s 25s/step - loss: 1.9000 - acc: 0.5608
Epoch 12/20
50/50 [==============================] - 800s 16s/step - loss: 1.8741 - acc: 0.5695
Epoch 13/20
50/50 [==============================] - 769s 15s/step - loss: 1.8432 - acc: 0.5712
Epoch 14/20
50/50 [==============================] - 740s 15s/step - loss: 1.8099 - acc: 0.5767
Epoch 15/20
50/50 [==============================] - 788s 16s/step - loss: 1.7865 - acc: 0.5799
Epoch 16/20
50/50 [==============================] - 796s 16s/step - loss: 1.7474 - acc: 0.5857
Epoch 17/20
50/50 [==============================] - 885s 18s/step - loss: 1.7102 - acc: 0.5945
Epoch 18/20
50/50 [==============================] - 1121s 22s/step - loss: 1.6910 - acc: 0.5977
Epoch 19/20
50/50 [==============================] - 849s 17s/step - loss: 1.6791 - acc: 0.6034
Epoch 20/20
50/50 [==============================] - 761s 15s/step - loss: 1.6745 - acc: 0.6013

Convert Keras model to kmodel

Now we get model named mbnet75.h5, and we need convert it to kmodel, which is K210’s model format.
We have useful toolbox for model converting:

First We convert h5 to pb.

./ --input_model workspace/mbnet75.h5  --output_model workspace/mbnet75.pb

and we browse graph:

./ workspace/mbnet75.pb

We need find the input node is “input_1”, output node is “dense_3/Softmax”

We use help us to generate cmd to convert pb to tflite:

toco --graph_def_file=workspace/mbnet75.pb --input_format=TENSORFLOW_GRAPHDEF --output_format=TFLITE --output_file=workspace/mbnet75.tflite --inference_type=FLOAT --input_type=FLOAT --input_arrays=input_1 --output_arrays=dense_1/Softmax --input_shapes=1,224,224,3

At last, we use to convert tflite to kmodel.

./ workspace/mbnet75.tflite

Finially, we get the kmodel file:

-rw-r--r--  1 root root   2655688 Apr 24 09:10 mbnet75.kmodel

Run kmodel on MaixPy

mbnet kmodel cost about 2.7MB RAM, the “full” MaixPy can’t fit it in, we need the minimal version of MaixPy (strip most openmv function and misc functions)

Here is the MaixPy firmware and mbnet kmodel(packaged to kfpkg, method refer: (2.7 MB)
In addition, we need label list to identify number to name: (10.0 KB)

Download the firmware ,then burn the kmodel, then put labels.txt to flash or MicroSD card, and we can run mobilenet demo in 30 lines !

import sensor, image, lcd, time
import KPU as kpu
sensor.set_windowing((224, 224))
lcd.draw_string(100,96,"MobileNet Demo")
lcd.draw_string(100,112,"Loading labels...")
task = kpu.load(0x200000) 
clock = time.clock()
    img = sensor.snapshot()
    fmap = kpu.forward(task, img)
    a = lcd.display(img, oft=(0,0))
    lcd.draw_string(0, 224, "%.2f:%s                            "%(pmax, labels[max_index].strip()))
a = kpu.deinit(task)

Press Ctrl+E goto paste mode, and Press Ctrl+D to run it.

We can see it identify husky picture correctly~
And we can see fps in serial terminal is about 26fps.
You can make it faster by boost CPU and KPU freq.
It can be up to CPU 500MHz, KPU 500MHz without modify hardware.
(CPU 700M, KPU 760M with hardware modified, boost core voltage).

Run kmodel on MaixDuino



You talk about burning the model, but are you not supposed to burn the firmware and the model together? Is that done with

Very good blog post otherwise lots of pointers.


in the debug stage, we’d like burn firmware and model independently.
when finished the work, you can burn them together by package them into one “kfpkg”.
refer to


The blog post was a bit hard to skim. Bottom line is that you can flash kfpkg which internally is defined to burn a model at 0x2000000 (pos 2MB) which you can load from directly with the Python kpu module. (as done in script at end)

pinned #5