本帖最后由 不爱胡萝卜的仓鼠 于 2024-11-3 20:59 编辑
在开始自己训练模型并运行到开发板上前,我得先尝试一下把一个现成的模型移植到STM32开发板上,先积累经验。
训练好的模型是无法直接在STM32上运行的,我浅薄的理解就是STM32只是一个单片机,他的硬件能力和软件架构无法直接运行AI模型,那么就需要一个工具,把这些复杂的东西转换成C语言的代码,这样STM32单片机就可以运行AI模型里,CUBE AI就是ST为各位开发者准备好的工具。
一.安装CUBE AI
CUBE AI是CUBEMX中的一个软件包,安装很简单,只需要打开cubemx,点击“Help”->“Manage embedded software packages”。
然后在弹出的页面中找到“STMicroelectronics”->“X-CUBE-AI”,再选择你需要的版本,再点击“install”即可。我这边以前期安装了9.0.0。今天写文章时最新已经到了9.1.0,我这边就偷懒不更新了
二.使用CUBE AI移植模型
2.1 激活CUBE AI
打开我们的串口工程,在左侧找到“X-CUBE-AI”这个选项,点击
然后弹出的界面如下
选择CUBE AI的版本,我这边安装的是9.0.0,那就选择9.0.0。然后勾选core,然后APP根据自己的需求选择,我这边 就选择Validation了。APP的这三个选项含义如下
开启成功界面如下,其中的勾都是自动勾选的,不需要我们再手动点
2.2 添加网络
接下来就需要添加一个网络了(其实就是添加一个AI模型),我因为没有自己训练的模型,那我就直接使用上次云AI下载下来的工程中的模型(这个模型其实也是从ST的Model Zoo下载的,大家可以直接访问如下网址:https://stm32ai.st.com/model-zoo/,即可获取ST为大家准备好的各种模型,也支持大家自己再次增加样本,重新训练)
选择模型的类型,我这边是Keras,它还支持TFLite、0NNX。大家根据自己的实际情况选择即可。然后选择网络的文件
串口输出我们就不用管了,因为之前的工程串口已经初始化好了,这边他也是自动匹配上了
添加模型到此就OK了
接下来分析一下,不进行分析的话,待会儿不允许我们生成工程
我这儿有个报错,需要修改一下注册表(如何修改注册表我就不赘述了,直接百度就好)
修改后再次分析,就可以看到进度条在动了
分析完成如下图
Analyzing model
C:/Users/Administrator/STM32Cube/Repository/Packs/STMicroelectronics/X-CUBE-AI/9.0.0/Utilities/windows/stedgeai.exe analyze --target stm32h7 --name network -m C:/Users/Administrator/Downloads/CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset.h5 --compression none --verbosity 1 --allocate-inputs --allocate-outputs --workspace C:/Users/ADMINI~1/AppData/Local/Temp/mxAI_workspace171808875528810010910823047711904506 --output C:/Users/Administrator/.stm32cubemx/network_output
STEdgeAI Core v9.0.0-19802
Creating c (debug) info json file C:\Users\ADMINI~1\AppData\Local\Temp\mxAI_workspace171808875528810010910823047711904506\network_c_info.json
Exec/report summary (analyze)
----------------------------------------------------------------------------------------------------------------------------------------
model file : C:\Users\Administrator\Downloads\CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset.h5
type : keras
c_name : network
compression : none
options : allocate-inputs, allocate-outputs
optimization : balanced
target/series : stm32h7
workspace dir : C:\Users\ADMINI~1\AppData\Local\Temp\mxAI_workspace171808875528810010910823047711904506
output dir : C:\Users\Administrator\.stm32cubemx\network_output
model_fmt : float
model_name : CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset
model_hash : 0x1e108c42827f4c62598744246d259703
params # : 2,752 items (10.75 KiB)
----------------------------------------------------------------------------------------------------------------------------------------
input 1/1 : 'input_1', f32(1x8x8x2), 512 Bytes, activations
output 1/1 : 'dense_1', f32(1x8), 32 Bytes, activations
macc : 8,520
weights (ro) : 11,008 B (10.75 KiB) (1 segment)
activations (rw) : 1,024 B (1024 B) (1 segment) *
ram (total) : 1,024 B (1024 B) = 1,024 + 0 + 0
----------------------------------------------------------------------------------------------------------------------------------------
(*) 'input'/'output' buffers can be used from the activations buffer
Model name - CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset
------ ------------------------------ ------------------- ------------- ------- ---------------
m_id layer (original) oshape param/size macc connected to
------ ------------------------------ ------------------- ------------- ------- ---------------
0 input_1 (InputLayer) [b:1,h:8,w:8,c:2]
------ ------------------------------ ------------------- ------------- ------- ---------------
1 conv2d (Conv2D) [b:1,h:6,w:6,c:8] 152/608 5,192 input_1
------ ------------------------------ ------------------- ------------- ------- ---------------
2 activation (Activation) [b:1,h:6,w:6,c:8] 288 conv2d
------ ------------------------------ ------------------- ------------- ------- ---------------
3 max_pooling2d (MaxPooling2D) [b:1,h:3,w:3,c:8] 288 activation
------ ------------------------------ ------------------- ------------- ------- ---------------
5 flatten (Flatten) [b:1,c:72] max_pooling2d
------ ------------------------------ ------------------- ------------- ------- ---------------
6 dense_dense (Dense) [b:1,c:32] 2,336/9,344 2,336 flatten
dense (Dense) [b:1,c:32] 32 dense_dense
------ ------------------------------ ------------------- ------------- ------- ---------------
7 dense_1_dense (Dense) [b:1,c:8] 264/1,056 264 dense
dense_1 (Dense) [b:1,c:8] 120 dense_1_dense
------ ------------------------------ ------------------- ------------- ------- ---------------
model: macc=8,520 weights=11,008 activations=-- io=--
Number of operations per c-layer
------- ------ ------------------------ ------- --------------
c_id m_id name (type) #op type
------- ------ ------------------------ ------- --------------
0 3 conv2d (Conv2D) 5,768 smul_f32_f32
1 6 dense_dense (Dense) 2,336 smul_f32_f32
2 6 dense (Nonlinearity) 32 op_f32_f32
3 7 dense_1_dense (Dense) 264 smul_f32_f32
4 7 dense_1 (Nonlinearity) 120 op_f32_f32
------- ------ ------------------------ ------- --------------
total 8,520
Number of operation types
---------------- ------- -----------
operation type # %
---------------- ------- -----------
smul_f32_f32 8,368 98.2%
op_f32_f32 152 1.8%
Complexity report (model)
------ --------------- ------------------------- ------------------------- --------
m_id name c_macc c_rom c_id
------ --------------- ------------------------- ------------------------- --------
3 max_pooling2d |||||||||||||||| 67.7% | 5.5% [0]
6 dense_dense ||||||| 27.8% |||||||||||||||| 84.9% [1, 2]
7 dense_1_dense | 4.5% || 9.6% [3, 4]
------ --------------- ------------------------- ------------------------- --------
macc=8,520 weights=11,008 act=1,024 ram_io=0
Requested memory size per segment ("stm32h7" series)
----------------------------- -------- -------- ------- -------
module text rodata data bss
----------------------------- -------- -------- ------- -------
NetworkRuntime900_CM7_GCC.a 10,220 0 0 0
network.o 584 40 1,796 168
network_data.o 52 16 88 0
lib (toolchain)* 318 328 0 0
----------------------------- -------- -------- ------- -------
RT total** 11,174 384 1,884 168
----------------------------- -------- -------- ------- -------
weights 0 11,008 0 0
activations 0 0 0 1,024
io 0 0 0 0
----------------------------- -------- -------- ------- -------
TOTAL 11,174 11,392 1,884 1,192
----------------------------- -------- -------- ------- -------
* toolchain objects (libm/libgcc*)
** RT - AI runtime objects (kernels+infrastructure)
Summary per type of memory device
--------------------------------------------
FLASH % RAM %
--------------------------------------------
RT total 13,442 55.0% 2,052 66.7%
--------------------------------------------
TOTAL 24,450 3,076
--------------------------------------------
Creating txt report file C:\Users\Administrator\.stm32cubemx\network_output\network_analyze_report.txt
elapsed time (analyze): 7.829s
Model file: CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset.h5
Total Flash: 24450 B (23.88 KiB)
Weights: 11008 B (10.75 KiB)
Library: 13442 B (13.13 KiB)
Total Ram: 3076 B (3.00 KiB)
Activations: 1024 B
Library: 2052 B (2.00 KiB)
Input: 512 B (included in Activations)
Output: 32 B (included in Activations)
Done
Analyze complete on AI model
在这里我们还可以在电脑上进行模拟测试
电脑模拟测试完成后,日志如下
Starting AI validation on desktop with random data...
C:/Users/Administrator/STM32Cube/Repository/Packs/STMicroelectronics/X-CUBE-AI/9.0.0/Utilities/windows/stedgeai.exe validate --target stm32h7 --name network -m C:/Users/Administrator/Downloads/CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset.h5 --compression none --verbosity 1 --allocate-inputs --allocate-outputs --workspace C:/Users/ADMINI~1/AppData/Local/Temp/mxAI_workspace171971186255410016731057912494845799 --output C:/Users/Administrator/.stm32cubemx/network_output
STEdgeAI Core v9.0.0-19802
Setting validation data...
generating random data, size=10, seed=42, range=(0, 1)
I[1]: (10, 8, 8, 2)/float32, min/max=[0.005, 1.000], mean/std=[0.498, 0.294], input_1
No output/reference samples are provided
Creating c (debug) info json file C:\Users\ADMINI~1\AppData\Local\Temp\mxAI_workspace171971186255410016731057912494845799\network_c_info.json
Copying the AI runtime files to the user workspace: C:\Users\ADMINI~1\AppData\Local\Temp\mxAI_workspace171971186255410016731057912494845799\inspector_network\workspace
Exec/report summary (validate)
----------------------------------------------------------------------------------------------------------------------------------------
model file : C:\Users\Administrator\Downloads\CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset.h5
type : keras
c_name : network
compression : none
options : allocate-inputs, allocate-outputs
optimization : balanced
target/series : stm32h7
workspace dir : C:\Users\ADMINI~1\AppData\Local\Temp\mxAI_workspace171971186255410016731057912494845799
output dir : C:\Users\Administrator\.stm32cubemx\network_output
model_fmt : float
model_name : CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset
model_hash : 0x1e108c42827f4c62598744246d259703
params # : 2,752 items (10.75 KiB)
----------------------------------------------------------------------------------------------------------------------------------------
input 1/1 : 'input_1', f32(1x8x8x2), 512 Bytes, activations
output 1/1 : 'dense_1', f32(1x8), 32 Bytes, activations
macc : 8,520
weights (ro) : 11,008 B (10.75 KiB) (1 segment)
activations (rw) : 1,024 B (1024 B) (1 segment) *
ram (total) : 1,024 B (1024 B) = 1,024 + 0 + 0
----------------------------------------------------------------------------------------------------------------------------------------
(*) 'input'/'output' buffers can be used from the activations buffer
Running the Keras model...
Running the STM AI c-model (AI RUNNER)...(name=network, mode=HOST)
X86 shared lib (C:\Users\ADMINI~1\AppData\Local\Temp\mxAI_workspace171971186255410016731057912494845799\inspector_network\workspace\lib\libai_network.dll) ['network']
Summary 'network' - ['network']
------------------------------------------------------------------------------------------
inputs/ouputs : 1/1
input_1 : f32[1,8,8,2], 512 Bytes, in activations buffer
output_1 : f32[1,1,1,8], 32 Bytes, in activations buffer
n_nodes : 5
compile_datetime : Nov 3 2024 20:50:43
activations : 1024
weights : 11008
macc : 8520
------------------------------------------------------------------------------------------
tools : Legacy ST.AI 9.0.0
capabilities : IO_ONLY, PER_LAYER, PER_LAYER_WITH_DATA
device : AMD64, AMD64 Family 23 Model 1 Stepping 1, AuthenticAMD, Windows
------------------------------------------------------------------------------------------
NOTE: duration and exec time per layer is just an indication. They are dependent of the HOST-machine work-load.
ST.AI Profiling results v1.2 - "network"
------------------------------------------------------------
nb sample(s) : 10
duration : 0.015 ms by sample (0.008/0.069/0.018)
macc : 8520
------------------------------------------------------------
Inference time per node
-------------------------------------------------------------------------------
c_id m_id type dur (ms) % cumul name
-------------------------------------------------------------------------------
0 3 Conv2dPool (0x109) 0.009 58.4% 58.4% ai_node_0
1 6 Dense (0x104) 0.003 17.4% 75.8% ai_node_1
2 6 NL (0x107) 0.001 8.1% 83.9% ai_node_2
3 7 Dense (0x104) 0.000 2.7% 86.6% ai_node_3
4 7 NL (0x107) 0.002 12.8% 99.3% ai_node_4
-------------------------------------------------------------------------------
total 0.015
-------------------------------------------------------------------------------
Statistic per tensor
-------------------------------------------------------------------------------
tensor # type[shape]:size min max mean std name
-------------------------------------------------------------------------------
I.0 10 f32[1,8,8,2]:512 0.005 1.000 0.498 0.294 input_1
O.0 10 f32[1,1,1,8]:32 0.000 1.000 0.125 0.321 output_1
-------------------------------------------------------------------------------
Saving validation data...
output directory: C:\Users\Administrator\.stm32cubemx\network_output
creating C:\Users\Administrator\.stm32cubemx\network_output\network_val_io.npz
m_outputs_1: (10, 1, 1, 8)/float32, min/max=[0.000, 1.000], mean/std=[0.125, 0.321], dense_1
c_outputs_1: (10, 1, 1, 8)/float32, min/max=[0.000, 1.000], mean/std=[0.125, 0.321], dense_1
Computing the metrics...
Cross accuracy report #1 (reference vs C-model)
----------------------------------------------------------------------------------------------------
notes: - the output of the reference model is used as ground truth/reference value
- 10 samples (8 items per sample)
acc=100.00%, rmse=0.000000063, mae=0.000000015, l2r=0.000000183, nse=1.000, cos=1.000
8 classes (10 samples)
------------------------------------------------
C0 10 . . . . . . .
C1 . 0 . . . . . .
C2 . . 0 . . . . .
C3 . . . 0 . . . .
C4 . . . . 0 . . .
C5 . . . . . 0 . .
C6 . . . . . . 0 .
C7 . . . . . . . 0
Evaluation report (summary)
--------------------------------------------------------------------------------------------------------------------------------------
Output acc rmse mae l2r mean std nse cos tensor
--------------------------------------------------------------------------------------------------------------------------------------
X-cross #1 100.00% 0.0000001 0.0000000 0.0000002 -0.0000000 0.0000001 1.0000000 1.0000000 dense_1, (8,), m_id=[7]
--------------------------------------------------------------------------------------------------------------------------------------
acc : Classification accuracy (all classes)
rmse : Root Mean Squared Error
mae : Mean Absolute Error
l2r : L2 relative error
nse : Nash-Sutcliffe efficiency criteria, bigger is better, best=1, range=(-inf, 1]
cos : COsine Similarity, bigger is better, best=1, range=(0, 1]
Creating txt report file C:\Users\Administrator\.stm32cubemx\network_output\network_validate_report.txt
elapsed time (validate): 7.011s
Validation ended
然后我们可以选择是否压缩,还有选择平衡还是要快还是省RAM,这个和我们之前云AI的相似u,就不多说了。这样通过调整和PC模拟,我们就可以在生成工程前得到一起预期的结果,大大节约时间
2.3 其他必须开启的外设
CPU的I CACHE、D CACHE、ART均需要打开(我这边没找到ART,那就先不管,如果你有的话,就给他打开)
然后还要打开CRC
2.4 生成工程
生成工程时要把heap最小值调整到0x2000,然后再生成工程
生成工程时不能有警告,我刚才就是没有分析模型,他给我警告了,如果强行生成工程,可能会出问题
2.5 小修改
我这个工程有点奇奇怪怪的,所以还得额外做点小修改。正常工程是不需要的
首先是ld文件要换回来(具体操作见前面的文章)
第二有个syscall.c的文件会被删除,就导致编译过不了,要把这个文件给弄回来。(弄回来编译会有警告,但是问题不大,代码可以运行,我一时半会儿也没有办法解决这个问题)
三.运行Validation测试工程
因为我们刚才选择Validation测试工程,因此,上电后会打印出各种信息,然后允许用户输入CMD
上电日志如下
[20:41:39.248]收←◆
#
# AI Validation 7.1
#
Compiled with GCC 12.3.1
STM32 device configuration...
Device : DevID:0x0485 (STM32H7[R,]Sxx) RevID:0x1003
Core Arch. : M7 - FPU used
HAL version : 0x01010000
SYSCLK clock : 600 MHz
HCLK clock : 300 MHz
FLASH conf. : ACR=0x00000037 - latency=7
CACHE conf. : $I/$D=(True,True)
[20:41:39.379]收←◆ Timestamp : SysTick + DWT (delay(1)=1.000 ms)
AI platform (API 1.1.0 - RUNTIME 9.0.0)
Discovering the network(s)...
Found network "network"
Creating the network "network"..
Initializing the network
Network informations...
model name : network
model signature : 0x1e108c42827f4c62598744246d259703
model datetime : Sun Nov 3 20:31:53 2024
compile datetime : Nov 3 2024 20:32:53
tools version : 9.0.0
complexity : 8520 MACC
c-nodes : 5
map_activations : 1
[0] @0x24000D60/1024
map_weights : 1
[0] @0x70013060/11008
n_inputs/n_outputs : 1/1
I[0] (1,8,8,2)128/float32 @0x24000DE0/512
O[0] (1,1,1,8)8/float32 @0x24000D60/32
-------------------------------------------
| READY to receive a CMD from the HOST... |
-------------------------------------------
# Note: At this point, default ASCII-base terminal should be closed
# and a serial COM interface should be used
# (i.e. Python ai_runner module). Protocol version = 3.1
看到这个就表示我们的代码已经成功跑起来了。
此时我们要关闭串口工具,然后会到cubemx中,点击在target上验证
选择开发板的串口,然后波特率用默认的115200
等待开发板与上位机交互,完成测试
测试结果如下
Starting AI validation on target with random data...
C:/Users/Administrator/STM32Cube/Repository/Packs/STMicroelectronics/X-CUBE-AI/9.0.0/Utilities/windows/stedgeai.exe validate --target stm32h7 --name network -m C:/Users/Administrator/Downloads/CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset.h5 --compression none --verbosity 1 --allocate-inputs --allocate-outputs --workspace C:/Users/ADMINI~1/AppData/Local/Temp/mxAI_workspace171943553747800013843934305667415686 --output C:/Users/Administrator/.stm32cubemx/network_output --mode target --desc serial:COM49:115200
STEdgeAI Core v9.0.0-19802
Setting validation data...
generating random data, size=10, seed=42, range=(0, 1)
I[1]: (10, 8, 8, 2)/float32, min/max=[0.005, 1.000], mean/std=[0.498, 0.294], input_1
No output/reference samples are provided
Creating c (debug) info json file C:\Users\ADMINI~1\AppData\Local\Temp\mxAI_workspace171943553747800013843934305667415686\network_c_info.json
Exec/report summary (validate)
----------------------------------------------------------------------------------------------------------------------------------------
model file : C:\Users\Administrator\Downloads\CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset.h5
type : keras
c_name : network
compression : none
options : allocate-inputs, allocate-outputs
optimization : balanced
target/series : stm32h7
workspace dir : C:\Users\ADMINI~1\AppData\Local\Temp\mxAI_workspace171943553747800013843934305667415686
output dir : C:\Users\Administrator\.stm32cubemx\network_output
model_fmt : float
model_name : CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset
model_hash : 0x1e108c42827f4c62598744246d259703
params # : 2,752 items (10.75 KiB)
----------------------------------------------------------------------------------------------------------------------------------------
input 1/1 : 'input_1', f32(1x8x8x2), 512 Bytes, activations
output 1/1 : 'dense_1', f32(1x8), 32 Bytes, activations
macc : 8,520
weights (ro) : 11,008 B (10.75 KiB) (1 segment)
activations (rw) : 1,024 B (1024 B) (1 segment) *
ram (total) : 1,024 B (1024 B) = 1,024 + 0 + 0
----------------------------------------------------------------------------------------------------------------------------------------
(*) 'input'/'output' buffers can be used from the activations buffer
Running the Keras model...
Running the STM AI c-model (AI RUNNER)...(name=network, mode=TARGET)
INTERNAL ERROR: E801(HwIOError): Invalid firmware - COM49:115200
Validation ended