Train, Convert, Run MobileNet on Sipeed MaixPy and MaixDuino !

Today we introduce how to Train, Convert, Run MobileNet model on Sipeed Maix board, with easy use MaixPy and MaixDuino~

Prepare environment

install Keras

We choose Keras as it is really easy to use.
First you should install TF and Keras environment, we recommended use tensorflow docker
docker pull tensorflow/tensorflow:1.13.1-gpu-py3-jupyter

for developer who have poor network speed, you can download Keras pre-trained mobilenet v1 model manually: https://github.com/fchollet/deep-learning-models/releases donwload and then put it into ~/.keras/models/


We suggest use mobilenet_7_5_224_tf_no_top.h5, it means 0.75x channel count, image input size is 224x244, and remove top softmax,drop layer.
Why we choose 0.75x? look at this table:

1.0x have 4.2M parameters, while 0.75x only have 2.6M.
When we use 8bit quantization, it is 4.2MB vs 2.6MB (>40% diff), but the accuracy loss is just 2%.
For MaixDuino, it is ok for both 1.0x and 0.75x, but in MaixPy, it is only suit for 0.75x (micropython environment cost too much ram).

download the dataset

We are going to train classics 1000-classes classifier with mobilenet, let’s download imagenet2012 dataset.
http://www.image-net.org/challenges/LSVRC/2012/nnoupb/ILSVRC2012_img_test.tar
http://www.image-net.org/challenges/LSVRC/2012/nnoupb/ILSVRC2012_img_val.tar
http://www.image-net.org/challenges/LSVRC/2012/nnoupb/ILSVRC2012_img_train.tar
http://www.image-net.org/challenges/LSVRC/2012/nnoupb/ILSVRC2012_devkit_t12.tar
http://www.image-net.org/challenges/LSVRC/2012/nnoupb/ILSVRC2012_bbox_train_v2.tar
It is about 200GB size, make sure your disk is free enough.
We baiscly use ILSVRC2012_img_train.tar.
Untar it, you will find 1000 tars inside, untar it again use the script:

#!/bin/bash
dir=./
for x in `ls $dir/*tar`
do
	filename=`basename $x .tar`
	mkdir $dir/$filename	
	tar -xvf $x -C $dir/$filename
done

rm *.tar

You get the imagenet dataset ready now~

Build the MobileNet model for Maix

adjust original mobilenet.py

The original mobilenet v1 model is locate at:
/usr/local/lib/python3.5/dist-packages/keras_applications/mobilenet.py

MAIX’s chip K210 use different padding method with Keras’s default padding method, so we need adjust it.
K210 use the padding method that padding zeros all around (left, up, right, down), but Keras default pad right and down.
We use ZeroPadding2D function to set the padding method for Keras:
x = layers.ZeroPadding2D(padding=((1, 1), (1, 1)), name='conv1_pad')(inputs)
add this line before every conv layer which using stride.

The modified code can be download from our repo:
https://github.com/sipeed/Maix-Keras-workspace
Replace original mobilenet.py in Keras (don’t forget backup it).

build own train script

Let’s create new train script to train mobilenet model.
First we should finish our model, last step we get the mobilenet model without “top”, let’s add the top:

base_model=keras.applications.mobilenet.MobileNet(input_shape=(224, 224, 3), alpha = 0.75,depth_multiplier = 1, dropout = 0.001, pooling='avg',include_top = False, weights = "imagenet", classes = 1000)
x=base_model.output
x = Dropout(0.001, name='dropout')(x)  
preds=Dense(1000,activation='softmax')(x) 
model=Model(inputs=base_model.input,outputs=preds)

We add dropout and one dense layer to get the prediction label.
And we need fixed the previous layers’ weight to save training time:

for i,layer in enumerate(model.layers):
    print(i,layer.name)

for layer in model.layers[:86]:
    layer.trainable=False
for layer in model.layers[86:]:
    layer.trainable=True

The whole script can be download from our github (mbnet_keras.py):
https://github.com/sipeed/Maix-Keras-workspace

train it!

It is a long time to train model, especially >100GB dataset.
If you have muti-GPU, use this to accelerate:
paralleled_model=multi_gpu_model(model, gpus=2)

It takes about 4 hours in my dual 1080Ti machine: (I save about every 10~15mins):

Epoch 1/20
50/50 [==============================] - 697s 14s/step - loss: 6.0500 - acc: 0.0666 
Epoch 2/20
50/50 [==============================] - 688s 14s/step - loss: 4.1333 - acc: 0.2665
Epoch 3/20
50/50 [==============================] - 696s 14s/step - loss: 3.2263 - acc: 0.3815
Epoch 4/20
50/50 [==============================] - 706s 14s/step - loss: 2.7671 - acc: 0.4442
Epoch 5/20
50/50 [==============================] - 709s 14s/step - loss: 2.5103 - acc: 0.4743
Epoch 6/20
50/50 [==============================] - 708s 14s/step - loss: 2.3257 - acc: 0.4968
Epoch 7/20
50/50 [==============================] - 712s 14s/step - loss: 2.1976 - acc: 0.5190
Epoch 8/20
50/50 [==============================] - 712s 14s/step - loss: 2.0934 - acc: 0.5346
Epoch 9/20
50/50 [==============================] - 721s 14s/step - loss: 2.0263 - acc: 0.5463
Epoch 10/20
50/50 [==============================] - 965s 19s/step - loss: 1.9472 - acc: 0.5575
Epoch 11/20
50/50 [==============================] - 1235s 25s/step - loss: 1.9000 - acc: 0.5608
Epoch 12/20
50/50 [==============================] - 800s 16s/step - loss: 1.8741 - acc: 0.5695
Epoch 13/20
50/50 [==============================] - 769s 15s/step - loss: 1.8432 - acc: 0.5712
Epoch 14/20
50/50 [==============================] - 740s 15s/step - loss: 1.8099 - acc: 0.5767
Epoch 15/20
50/50 [==============================] - 788s 16s/step - loss: 1.7865 - acc: 0.5799
Epoch 16/20
50/50 [==============================] - 796s 16s/step - loss: 1.7474 - acc: 0.5857
Epoch 17/20
50/50 [==============================] - 885s 18s/step - loss: 1.7102 - acc: 0.5945
Epoch 18/20
50/50 [==============================] - 1121s 22s/step - loss: 1.6910 - acc: 0.5977
Epoch 19/20
50/50 [==============================] - 849s 17s/step - loss: 1.6791 - acc: 0.6034
Epoch 20/20
50/50 [==============================] - 761s 15s/step - loss: 1.6745 - acc: 0.6013

Convert Keras model to kmodel

Now we get model named mbnet75.h5, and we need convert it to kmodel, which is K210’s model format.
We have useful toolbox for model converting:

First We convert h5 to pb.

./keras_to_tensorflow.py --input_model workspace/mbnet75.h5  --output_model workspace/mbnet75.pb

and we browse graph:

./gen_pb_graph.py workspace/mbnet75.pb



We need find the input node is “input_1”, output node is “dense_3/Softmax”

We use pb2tflite.sh help us to generate cmd to convert pb to tflite:

toco --graph_def_file=workspace/mbnet75.pb --input_format=TENSORFLOW_GRAPHDEF --output_format=TFLITE --output_file=workspace/mbnet75.tflite --inference_type=FLOAT --input_type=FLOAT --input_arrays=input_1 --output_arrays=dense_1/Softmax --input_shapes=1,224,224,3

At last, we use tflite2kmodel.sh to convert tflite to kmodel.

./tflite2kmodel.sh workspace/mbnet75.tflite

Finially, we get the kmodel file:

-rw-r--r--  1 root root   2655688 Apr 24 09:10 mbnet75.kmodel

Run kmodel on MaixPy

mbnet kmodel cost about 2.7MB RAM, the “full” MaixPy can’t fit it in, we need the minimal version of MaixPy (strip most openmv function and misc functions)

Here is the MaixPy firmware and mbnet kmodel(packaged to kfpkg, method refer: http://blog.sipeed.com/p/390.html)
maixpy_mbnet.zip (2.7 MB)
In addition, we need label list to identify number to name:
labels.zip (10.0 KB)

Download the firmware ,then burn the kmodel, then put labels.txt to flash or MicroSD card, and we can run mobilenet demo in 30 lines !

import sensor, image, lcd, time
import KPU as kpu
lcd.init()
sensor.reset()
sensor.set_pixformat(sensor.RGB565)
sensor.set_framesize(sensor.QVGA)
sensor.set_windowing((224, 224))
sensor.set_vflip(1)
sensor.run(1)
lcd.clear()
lcd.draw_string(100,96,"MobileNet Demo")
lcd.draw_string(100,112,"Loading labels...")
f=open('labels.txt','r')
labels=f.readlines()
f.close()
task = kpu.load(0x200000) 
clock = time.clock()
while(True):
    img = sensor.snapshot()
    clock.tick()
    fmap = kpu.forward(task, img)
    fps=clock.fps()
    plist=fmap[:]
    pmax=max(plist)	
    max_index=plist.index(pmax)	
    a = lcd.display(img, oft=(0,0))
    lcd.draw_string(0, 224, "%.2f:%s                            "%(pmax, labels[max_index].strip()))
    print(fps)
a = kpu.deinit(task)

Press Ctrl+E goto paste mode, and Press Ctrl+D to run it.

We can see it identify husky picture correctly~
And we can see fps in serial terminal is about 26fps.
You can make it faster by boost CPU and KPU freq.
It can be up to CPU 500MHz, KPU 500MHz without modify hardware.
(CPU 700M, KPU 760M with hardware modified, boost core voltage).

Run kmodel on MaixDuino

TODO.

2 Likes

You talk about burning the model, but are you not supposed to burn the firmware and the model together? Is that done with kflash.py?

Very good blog post otherwise lots of pointers.

in the debug stage, we’d like burn firmware and model independently.
when finished the work, you can burn them together by package them into one “kfpkg”.
refer to http://blog.sipeed.com/p/390.html

The blog post was a bit hard to skim. Bottom line is that you can flash kfpkg which internally is defined to burn a model at 0x2000000 (pos 2MB) which you can load from directly with the Python kpu module. (as done in script at end)

转换KMODEL时ncc提示不支持PAD层

1 Like

你需要使用最新的0.4版ncc
字数补丁

In the training script mbnet_keras.py you shrink the images to 128x128. Why do you use mobilenet 224x224?

the functions shrink to 128 is not used, I forget to clean the code.

Gotcha, these 2 image loading functions are only for testing at the end, I’ll remove them

So Im using your transfer learning to categorize 734 images into 4 categories. The images are found
Found 734 images belonging to 4 classes.
but it’s getting an error

Epoch 1/20
Traceback (most recent call last):
  File "c:/Users/laurent/Dropbox/AI/mbnet_keras.py", line 89, in <module>
    paralleled_model.fit_generator(generator=train_generator,steps_per_epoch=step_size_train,callbacks=callbacks_list,epochs=20)
  File "C:\Users\laurent\.conda\envs\tf_gpu\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\laurent\.conda\envs\tf_gpu\lib\site-packages\keras\engine\training.py", line 1418, in fit_generator
    initial_epoch=initial_epoch)
  File "C:\Users\laurent\.conda\envs\tf_gpu\lib\site-packages\keras\engine\training_generator.py", line 181, in fit_generator
    generator_output = next(output_generator)
  File "C:\Users\laurent\.conda\envs\tf_gpu\lib\site-packages\keras\utils\data_utils.py", line 709, in get
    six.reraise(*sys.exc_info())
  File "C:\Users\laurent\.conda\envs\tf_gpu\lib\site-packages\six.py", line 693, in reraise
    raise value
  File "C:\Users\laurent\.conda\envs\tf_gpu\lib\site-packages\keras\utils\data_utils.py", line 685, in get
    inputs = self.queue.get(block=True).get()
  File "C:\Users\laurent\.conda\envs\tf_gpu\lib\multiprocessing\pool.py", line 657, in get
    raise self._value
  File "C:\Users\laurent\.conda\envs\tf_gpu\lib\multiprocessing\pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\laurent\.conda\envs\tf_gpu\lib\site-packages\keras\utils\data_utils.py", line 626, in next_sample
    return six.next(_SHARED_SEQUENCES[uid])
  File "C:\Users\laurent\.conda\envs\tf_gpu\lib\site-packages\keras_preprocessing\image\iterator.py", line 100, in __next__
    return self.next(*args, **kwargs)
  File "C:\Users\laurent\.conda\envs\tf_gpu\lib\site-packages\keras_preprocessing\image\iterator.py", line 112, in next
    return self._get_batches_of_transformed_samples(index_array)
  File "C:\Users\laurent\.conda\envs\tf_gpu\lib\site-packages\keras_preprocessing\image\iterator.py", line 226, in _get_batches_of_transformed_samples
    interpolation=self.interpolation)
  File "C:\Users\laurent\.conda\envs\tf_gpu\lib\site-packages\keras_preprocessing\image\utils.py", line 102, in load_img
    raise ImportError('Could not import PIL.Image. '
ImportError: Could not import PIL.Image. The use of `array_to_img` requires PIL.

conda install pillow
Zepan, you forgot that in your post

oh, thank you, I have installed it, and forget add it in post

1 Like

When converting the model to kmodel I’m getting this error:
Fatal: Layer PAD is not supported
What does it mean and how do I solve that?

By the way, all this can be done on windows, if you’re interested. (I think some people might be put off by the linux requirement and maybe you’ll sell a few more boards this way)

Model is now 100% accurate and 5e-6 loss after 1000 epochs, 300 batch. I think it’s over trained, with only 1000 photos :sweat_smile:
I’ll do a nice write up once the maix bit can recognize obstacles from open space, too low and too high. I shot video in the forest and our land, then batch converted it to jpg sequences for classification.

can you post the pb to check?

https://drive.google.com/open?id=1mTOUrn8xWj29ugSoFACg0vSyo0FokDni

please do not use PAD method, use SpaceToBatchND instead, here is the example:
image

Since I’m using Keras I don’t see where the PAD method come from and I’m using your modified mobilenet.py that has ZeroPadding2D((1,1),(1,1))
Can you tell me how to replace PAD with SpaceToBatchND?

You may need to update keras to the latest version.

2.2.4 is already installed.