【玄铁杯第三届RISC-V应用创新大赛】LicheePi 4A HHB工具使用基础与MobilenertV2测试

DDZZ669 · 发表于2023-10-28 16:34

【玄铁杯第三届RISC-V应用创新大赛】LicheePi 4A HHB工具使用基础与MobilenertV2测试 [复制链接]

在LicheePi4A板子上运行AI模型，需要使用HHB工具进行模型部署，上篇文章测试Yolov5n中已经用到了HHB工具。

由于对HHB工具的使用还比较陌生，本篇参考HHB 用户手册：<https://www.yuque.com/za4k4z/oxlbxl>，进行HHB工具的实操使用测试，然后再测试MobilenertV2使用HHB编译后，在板子中的运行效果，并分析Demo代码的程序逻辑。

# 1 HHB快速上手

HHB （Heterogeneous Honey Badger tools collection） 是 T-HEAD 提供的一套针对无剑 SoC 平台的神经网络模型部署工具集。

包括了编译优化，性能分析，过程调试，结果模拟等一系列部署时所需的工具。

![](https://xxpcb-1259761082.cos.ap-shanghai.myqcloud.com/pic2/LiPi4A/9/1.png)

## 1.1 HHB工具确认

HHB 已经默认安装在docker中的 /tools 目录，进入docker后，可以用 hhb --version 命令确认

```sh
xxpcb@xxpcb-ubuntu20:~/Desktop$ sudo docker restart your.hhb2.4
[sudo] password for xxpcb: 
your.hhb2.4
xxpcb@xxpcb-ubuntu20:~/Desktop$ sudo docker exec -it your.hhb2.4 /bin/bash
root@9b266aa1e9ac:/# hhb --version
HHB version: 2.4.5, build 20230703
root@9b266aa1e9ac:/# 
```

## 1.2 运行脚本查看

以在玄铁 c906 上部署 mobilenet 为例，在  /home/example/basic/c906/onnx_mobilenetv2/ 中，已经有完整的 run.sh 描述了整个过程。

```sh
root@9b266aa1e9ac:~# ls
root@9b266aa1e9ac:~# cd /home/example/basic/c906/onnx_mobilenetv2/
root@9b266aa1e9ac:/home/example/basic/c906/onnx_mobilenetv2# ls
deploy.py  persian_cat.jpg  run.sh
```

run.sh中的内容

```sh
#!/bin/bash -x

hhb -S --model-file ../../model/mobilenetv2-12.onnx  --data-scale 0.017 --data-mean "124 117 104" --board c906 --input-name "input" --output-name "output" --input-shape "1 3 224 224" --postprocess save_and_top5 --simulate-data persian_cat.jpg
```

## 1.3 参数含义

hhb后面的第一个参数-S是流程控制参数，其所有可取值的含义如下

### 1.3.1 流程控制参数

| 流程控制参数 | 对应的英文全称 | 含义                                 |
| ------------ | -------------- | ------------------------------------ |
| -E           | import         | 执行到 import 阶段为止               |
| -Q           | quantize       | 执行到 quantize 阶段为止             |
| -C           | codegen        | 执行到 codegen 阶段为止              |
| -D           | deploy         | 执行到编译出目标平台的可执行文件为止 |
| -S           | simulate       | 执行到使用模拟器执行出结果为止       |

后续的参数是常规参数

### 1.3.2 常规选项

| 参数            | 值                              | 含义                                                         |
| --------------- | ------------------------------- | ------------------------------------------------------------ |
| --model-file    | ../../model/mobilenetv2-12.onnx | 指定模型文件所在位置，此例是mobilenetv2-12.onnx              |
| --data-scale    | 0.017                           | 输入数值的缩放，默认是1，此例是0.017                         |
| --data-mean     | "124 117 104"                   | 输入的均值，不同数值之间按空格(或者分号)分开                 |
| --board         | c906                            | 设置目标平台（可选th1520，c906，c908，c920，c920v2, x86_ref) |
| --input-name    | "input"                         | 指定模型的输入节点名字                                       |
| --output-name   | "output"                        | 指定模型的输出节点名字                                       |
| --input-shape   | "1 3 224 224"                   | 指定模型的输入节点的形状                                     |
| --postprocess   | save_and_top5                   | 设置后处理的类型，此例是保存到文件，并且打印输出中数值最大的5项 |
| --simulate-data | persian_cat.jpg                 | 指定模拟执行的数据集路径                                     |

## 1.4 运行脚本

### 1.4.1 运行中的打印

运行该脚本，输出结果如下：

```sh
root@9b266aa1e9ac:/home/example/basic/c906/onnx_mobilenetv2# ./run.sh 
+ hhb -S --model-file ../../model/mobilenetv2-12.onnx --data-scale 0.017 --data-mean '124 117 104' --board c906 --input-name input --output-name output --input-shape '1 3 224 224' --postprocess save_and_top5 --simulate-data persian_cat.jpg
[2023-10-28 01:45:25] (HHB LOG): Start import model.
[2023-10-28 01:45:31] (HHB LOG): Model import completed! 
[2023-10-28 01:45:31] (HHB LOG): Start quantization.
[2023-10-28 01:45:31] (HHB LOG): Start optimization.
[2023-10-28 01:45:31] (HHB LOG): Optimization completed!
[2023-10-28 01:45:31] (HHB LOG): Start conversion to csinn.
[2023-10-28 01:45:32] (HHB LOG): Conversion completed!
[2023-10-28 01:45:32] (HHB LOG): Start operator fusion.
[2023-10-28 01:45:32] (HHB LOG): Operator fusion completed!
[2023-10-28 01:45:32] (HHB LOG): Start operator split.
[2023-10-28 01:45:32] (HHB LOG): Operator split completed!
[2023-10-28 01:45:32] (HHB LOG): Start layout convert.
[2023-10-28 01:45:32] (HHB LOG): Layout convert completed!
[2023-10-28 01:45:32] (HHB LOG): Quantization completed!
[2023-10-28 01:45:46] (HHB LOG): cd hhb_out; qemu-riscv64 -cpu c906fdv hhb_runtime ./hhb.bm persian_cat.jpg.0.bin 
Run graph execution time: 10309.05469ms, FPS=0.10

=== tensor info ===
shape: 1 3 224 224 
data pointer: 0x25ec40

=== tensor info ===
shape: 1 1000 
data pointer: 0x1cace0
The max_value of output: 16.000000
The min_value of output: -7.933594
The mean_value of output: 0.001621
The std_value of output: 8.914919
 ============ top5: ===========
283: 16.000000
281: 13.976562
287: 12.195312
282: 11.421875
285: 11.335938
root@9b266aa1e9ac:/home/example/basic/c906/onnx_mobilenetv2# 
```

### 1.4.2 运行后的输出文件

命令会在当前目录的子目录 hhb_out 中生成示例程序，并且展现示例图片的参考结果：

```sh
root@9b266aa1e9ac:/home/example/basic/c906/onnx_mobilenetv2# ls
deploy.py  hhb_out  persian_cat.jpg  run.sh
root@9b266aa1e9ac:/home/example/basic/c906/onnx_mobilenetv2# ls hhb_out/
hhb.bm  hhb_runtime  io.c  io.h  main.c  main.o  model.c  model.o  model.params  persian_cat.jpg.0.bin  persian_cat.jpg.0.bin_output0_1_1000.txt  process.c  process.h
root@9b266aa1e9ac:/home/example/basic/c906/onnx_mobilenetv2# 
```

hhb_out 中生成文件含义如下。

板子中运行需要用的文件：

- hhb_runtime：生成的可执行文件

- hhb.bm：示例程序使用的模型权重
- persian_cat.jpg.0.bin：经过的预处理的输入数据

其它文件：

- main.c：示例程序的参考入口
- model.c：模型结构文件，与模型结构相关
- model.params：模型权重数据
- io.c：读写文件的辅助函数
- io.h：读写文件的辅助函数声明
- process.c：图像预处理函数
- process.h：图像预处理函数声明

### 1.4.3 板子中运行

将这些复制到 c906 Linux 操作系统的开发板的任意目录中，然后执行如下命令，即可展现 mobilenet v2 的分类结果：

```sh
./hhb_runtime hhb.bm persian_cat.jpg.0.bin
```

没有c906的板子，这里就不演示了。

## 1.5 性能分析

profiler 子命令是 HHB 集成的一个性能分析工具，负责分析网络模型的算力和内存读写等性能相关的热点，为特定平台上的模型优化提供量化数据。

可以使用如下命令，即可统计 mobilenetv2 的理论算力：

```sh
root@9b266aa1e9ac:/home/example/basic/c906/onnx_mobilenetv2# hhb profiler --model-file ../../model/mobilenetv2-12.onnx --indicator all -in "input" -on "output" -is "1 3 224 224"
Toal profiler information as follows:
Total calculation amount: macc=300774272, flops=601827648
Total memory(float): params=13947264 bytes, output=54308672 bytes.
Total ddr: accum_ddr=0 bytes, coeff_ddr=0 bytes,
           input_ddr=0 bytes, output_ddr=0 bytes.
root@9b266aa1e9ac:/home/example/basic/c906/onnx_mobilenetv2# 
```

参数说明：

| 参数              | 值                              | 含义                                           |
| ----------------- | ------------------------------- | ---------------------------------------------- |
| --model-file      | ../../model/mobilenetv2-12.onnx | 指定统计所用的源文件                           |
| --indicator       | all                             | 指定需要统计的信息，cal 表示只统计计算量       |
| -in/--input-name  | "input"                         | 指定模型的输入节点形状，多个输入用分号分隔     |
| -on/--output-name | "output"                        | 指定模型的输出节点名字，多个输出用分号分隔     |
| -is/--input-shape | "1 3 224 224"                   | 选项指定模型的输入节点形状，多个输入用分号分隔 |

# 2 HHB其它知识点

## 2.1 模型与平台

HHB 支持将多种训练框架导出的模型部署到多种目标平台上：

![](https://xxpcb-1259761082.cos.ap-shanghai.myqcloud.com/pic2/LiPi4A/9/2.png)

目前支持的模型类型：caffemodel、pb、onnx、tflite

- ONNX 对应的 .onnx 后缀的格式
- Caffe 框架导出的模型分两个文件，分别以.caffemodel和.prototxt作为文件后缀
- pb 格式的模是 TensorFlow 框架导出的模型
- tflite 模型是 .tflite 后缀的格式的模型，通常由 tensorflow 框架训练出来的模型转换而来

目前支持的目标平台：C906、C908、C920、TH1520、x86_ref

- c906 是玄铁 9 系列中高能耗比的矢量计算 CPU，D1主要使用 C906 的RISC-V 矢量计算指令集作为模型推理的加速方式
- TH1520 是首款基于无剑600 SoC平台设计的量产多模态AI处理器SoC原型

## 2.2 各种量化方式

模型通常是用 float32 而来，而玄铁平台侧重嵌入式和边缘计算方向，所以一般需要通过量化才能在玄铁平台正确执行。

| 量化方式                                           | 参数名称        | 其它说明                                                 |
| -------------------------------------------------- | --------------- | -------------------------------------------------------- |
| int8 对称量化                                      | int8_sym        | 一般不推荐，相比其他量化方式精度损失较大                 |
| int8 非对称量化                                    | int8_asym       |                                                          |
| uint8 非对称量化                                   | uint8_asym      |                                                          |
| int16 量化                                         | int16_sym       |                                                          |
| int16_sym                                          | float16         | 通常是精度表现最好的方案，也是 RISC-V CPU 的推荐量化方式 |
| 输入和输出 int8 非对称量化权重 int8 对称量化 | int8_asym_w_sym | 常见于 TFLite 格式的量化模型                             |
| 输入和输出 float16 量化权重 int8 对称量化    | float16_w_int8  | HHB 定制的一种量化方式，兼顾精度和权重大小               |

## 2.3 CPU模拟器

RISC-V CPU 模拟器可以执行模拟已经编译完成的玄铁 CPU 可执行文件。

![](https://xxpcb-1259761082.cos.ap-shanghai.myqcloud.com/pic2/LiPi4A/9/3.png)

- QEMU：用CPU指令做推理的玄铁平台，可以直接使用QEMU模拟器模拟执行
- 开发板执行：将模型编译成可执行文件，执行在各种玄铁平台实体开发板

以c906平台的MobileNet 为例，使用如下命令，将模型转换为 CPU 执行所需的可执行文件：

```sh
hhb -D --model-file ../../model/mobilenetv2-12.onnx  --data-scale 0.017 --data-mean "124 117 104" --board c906 --input-name "input" --output-name "output" --input-shape "1 3 224 224" --postprocess save_and_top5 
```

测试前可将之前的hhb_out目录先删掉，执行结果如下：

```sh
root@9b266aa1e9ac:/home/example/basic/c906/onnx_mobilenetv2# ls
deploy.py  hhb_out  persian_cat.jpg  run.sh
root@9b266aa1e9ac:/home/example/basic/c906/onnx_mobilenetv2# rm -rf hhb_out/
root@9b266aa1e9ac:/home/example/basic/c906/onnx_mobilenetv2# ls
deploy.py  persian_cat.jpg  run.sh
root@9b266aa1e9ac:/home/example/basic/c906/onnx_mobilenetv2# hhb -D --model-file ../../model/mobilenetv2-12.onnx  --data-scale 0.017 --data-mean "124 117 104" --board c906 --input-name "input" --output-name "output" --input-shape "1 3 224 224" --postprocess save_and_top5 
[2023-10-28 06:33:36] (HHB LOG): Start import model.
[2023-10-28 06:33:37] (HHB LOG): Model import completed! 
[2023-10-28 06:33:37] (HHB LOG): Start quantization.
[2023-10-28 06:33:37] (HHB LOG): Start optimization.
[2023-10-28 06:33:37] (HHB LOG): Optimization completed!
[2023-10-28 06:33:37] (HHB LOG): Start conversion to csinn.
[2023-10-28 06:33:38] (HHB LOG): Conversion completed!
[2023-10-28 06:33:38] (HHB LOG): Start operator fusion.
[2023-10-28 06:33:38] (HHB LOG): Operator fusion completed!
[2023-10-28 06:33:38] (HHB LOG): Start operator split.
[2023-10-28 06:33:38] (HHB LOG): Operator split completed!
[2023-10-28 06:33:38] (HHB LOG): Start layout convert.
[2023-10-28 06:33:38] (HHB LOG): Layout convert completed!
[2023-10-28 06:33:38] (HHB LOG): Quantization completed!
root@9b266aa1e9ac:/home/example/basic/c906/onnx_mobilenetv2# ls
deploy.py  hhb_out  persian_cat.jpg  run.sh
root@9b266aa1e9ac:/home/example/basic/c906/onnx_mobilenetv2# ls hhb_out/
hhb.bm  hhb_runtime  io.c  io.h  main.c  main.o  model.c  model.o  model.params  process.c  process.h
root@9b266aa1e9ac:/home/example/basic/c906/onnx_mobilenetv2# 
```

调用 RISC-V CPU 模拟器模拟 c906 CPU 的执行结果：

```sh
root@9b266aa1e9ac:/home/example/basic/c906/onnx_mobilenetv2# qemu-riscv64 -cpu c906fdv hhb_out/hhb_runtime hhb_out/hhb.bm persian_cat.jpg
Run graph execution time: 10032.33398ms, FPS=0.10

=== tensor info ===
shape: 1 3 224 224 
data pointer: 0x1cace0

=== tensor info ===
shape: 1 1000 
data pointer: 0x25a7c0
The max_value of output: 16.046875
The min_value of output: -7.957031
The mean_value of output: 0.001386
The std_value of output: 8.905679
 ============ top5: ===========
283: 16.046875
281: 13.945312
287: 12.203125
282: 11.437500
285: 11.296875
root@9b266aa1e9ac:/home/example/basic/c906/onnx_mobilenetv2# ls
deploy.py  hhb_out  persian_cat.jpg  persian_cat.jpg_output0_1_1000.txt  run.sh
root@9b266aa1e9ac:/home/example/basic/c906/onnx_mobilenetv2#
```

# 3 MobilenertV2在板子中测试

前面介绍的hhb的操作，都是是Ubuntu虚拟机的Docker中测试的，模拟使用的是C906平台。

本小节在LicheePi4A板子（TH1520平台）中实际测试MobilenertV2的效果。

## 3.1 准备工作

首先将mobilenetv2-12.onnx模型下载到示例目录 /home/example/th1520_npu/onnx_mobilenetv2_c++下。

方法一：可以去github下载：

![](https://xxpcb-1259761082.cos.ap-shanghai.myqcloud.com/pic2/LiPi4A/9/4.png)

方法二：此模型在 /home/example/basic/model目录下已经有了，可以拷贝过来。

然后在 /home/example/th1520_npu目录，从github下载优化版本 opencv 所需的库文件

```sh
git clone https://github.com/zhangwm-pt/prebuilt_opencv.git
```

实测记录：

```sh
root@9b266aa1e9ac:/home/example/th1520_npu/onnx_mobilenetv2_c++# ls
main.cpp  persian_cat.jpg  synset.txt
root@9b266aa1e9ac:/home/example/th1520_npu/onnx_mobilenetv2_c++# cp /home/example/basic/model/mobilenetv2-12.onnx ./
root@9b266aa1e9ac:/home/example/th1520_npu/onnx_mobilenetv2_c++# ls
main.cpp  mobilenetv2-12.onnx  persian_cat.jpg  synset.txt
root@9b266aa1e9ac:/home/example/th1520_npu/onnx_mobilenetv2_c++# cd ..
root@9b266aa1e9ac:/home/example/th1520_npu# ls
onnx_mobilenetv2_c++  yolov5n
root@9b266aa1e9ac:/home/example/th1520_npu# git clone https://github.com/zhangwm-pt/prebuilt_opencv.git
Cloning into 'prebuilt_opencv'...
remote: Enumerating objects: 461, done.
remote: Counting objects: 100% (4/4), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 461 (delta 0), reused 4 (delta 0), pack-reused 457
Receiving objects: 100% (461/461), 47.56 MiB | 81.00 KiB/s, done.
Resolving deltas: 100% (78/78), done.
Updating files: 100% (395/395), done.
root@9b266aa1e9ac:/home/example/th1520_npu# ls
onnx_mobilenetv2_c++  prebuilt_opencv  yolov5n
root@9b266aa1e9ac:/home/example/th1520_npu# 
```

## 3.2 HHB编译

将 ONNX 模型交叉编译成 NPU 上可执行的程序，NPU 上仅支持8位或者16位定点运算，本示例中指定为 int8 非对称量化，指令如下：

```sh
cd /home/example/th1520_npu/onnx_mobilenetv2_c++
hhb -D --model-file mobilenetv2-12.onnx --data-scale 0.017 --data-mean "124 117 104"  --board th1520  --postprocess save_and_top5 --input-name "input" --output-name "output" --input-shape "1 3 224 224" --calibrate-dataset persian_cat.jpg  --quantization-scheme "int8_asym"
```

如果要编译为CPU类型的，指令为：

```sh
hhb -D --model-file mobilenetv2-12.onnx --data-scale 0.017 --data-mean "124 117 104"  --board c920  --postprocess save_and_top5 --input-name "input" --output-name "output" --input-shape "1 3 224 224"
```

实测记录：

```sh
root@9b266aa1e9ac:/home/example/th1520_npu/onnx_mobilenetv2_c++# 
root@9b266aa1e9ac:/home/example/th1520_npu/onnx_mobilenetv2_c++# hhb -D --model-file mobilenetv2-12.onnx --data-scale 0.017 --data-mean "124 117 104"  --board th1520  --postprocess save_and_top5 --input-name "input" --output-name "output" --input-shape "1 3 224 224" --calibrate-dataset persian_cat.jpg  --quantization-scheme "int8_asym"
[2023-10-28 07:12:08] (HHB LOG): Start import model.
[2023-10-28 07:12:09] (HHB LOG): Model import completed! 
[2023-10-28 07:12:09] (HHB LOG): Start quantization.
[2023-10-28 07:12:09] (HHB LOG): get calibrate dataset from persian_cat.jpg
[2023-10-28 07:12:09] (HHB LOG): Start optimization.
[2023-10-28 07:12:10] (HHB LOG): Optimization completed!
Calibrating: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 153/153 [00:20<00:00,  7.52it/s]
[2023-10-28 07:12:30] (HHB LOG): Start conversion to csinn.
[2023-10-28 07:12:30] (HHB LOG): Conversion completed!
[2023-10-28 07:12:30] (HHB LOG): Start operator fusion.
[2023-10-28 07:12:31] (HHB LOG): Operator fusion completed!
[2023-10-28 07:12:31] (HHB LOG): Start operator split.
[2023-10-28 07:12:31] (HHB LOG): Operator split completed!
[2023-10-28 07:12:31] (HHB LOG): Start layout convert.
[2023-10-28 07:12:31] (HHB LOG): Layout convert completed!
[2023-10-28 07:12:31] (HHB LOG): Quantization completed!
root@9b266aa1e9ac:/home/example/th1520_npu/onnx_mobilenetv2_c++# ls
hhb_out  main.cpp  mobilenetv2-12.onnx  persian_cat.jpg  synset.txt
root@9b266aa1e9ac:/home/example/th1520_npu/onnx_mobilenetv2_c++# 
```

命令执行完成后，会在当前目录生成 hhb_out 子目录。

## 3.3 g++编译

然后使用g++编译指令：

```sh
root@9b266aa1e9ac:/home/example/th1520_npu/onnx_mobilenetv2_c++# riscv64-unknown-linux-gnu-g++ main.cpp -I../prebuilt_opencv/include/opencv4 -L../prebuilt_opencv/lib   -lopencv_imgproc   -lopencv_imgcodecs -L../prebuilt_opencv/lib/opencv4/3rdparty/ -llibjpeg-turbo -llibwebp -llibpng -llibtiff -llibopenjp2    -lopencv_core -ldl  -lpthread -lrt -lzlib -lcsi_cv -latomic -static -o mobilenetv2_example
/tools/Xuantie-900-gcc-linux-5.10.4-glibc-x86_64-V2.6.1-light.1/bin/../lib/gcc/riscv64-unknown-linux-gnu/10.2.0/../../../../riscv64-unknown-linux-gnu/bin/ld: ../prebuilt_opencv/lib/libopencv_core.a(filesystem.cpp.o): in function `.L0 ':
filesystem.cpp:(.text._ZN2cv6plugin4impl10DynamicLib11libraryLoadERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x4e): warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
root@9b266aa1e9ac:/home/example/th1520_npu/onnx_mobilenetv2_c++# ls
hhb_out  main.cpp  mobilenetv2-12.onnx  mobilenetv2_example  persian_cat.jpg  synset.txt
root@9b266aa1e9ac:/home/example/th1520_npu/onnx_mobilenetv2_c++# 
root@9b266aa1e9ac:/home/example/th1520_npu/onnx_mobilenetv2_c++# file mobilenetv2_example 
mobilenetv2_example: ELF 64-bit LSB executable, UCB RISC-V, version 1 (GNU/Linux), statically linked, for GNU/Linux 4.15.0, with debug_info, not stripped
root@9b266aa1e9ac:/home/example/th1520_npu/onnx_mobilenetv2_c++# 
```

编译完成后会生成mobilenetv2_example可执行文件

## 3.4 在板子中运行

与上一篇测试Yolov5n类似，将整个目录复制到开发板的目录中。可以使用 scp 命令，例如我的指令是：

```sh
root@9b266aa1e9ac:/home/example/th1520_npu/onnx_mobilenetv2_c++# cd ..
root@9b266aa1e9ac:/home/example/th1520_npu# ls
onnx_mobilenetv2_c++  prebuilt_opencv  yolov5n
root@9b266aa1e9ac:/home/example/th1520_npu# scp -r onnx_mobilenetv2_c++/ sipeed@192.168.5.110:~/Desktop/tfcard
sipeed@192.168.5.110's password: 
synset.txt                       100%   31KB 530.0KB/s   00:00    
main.cpp                         100% 3705   311.8KB/s   00:00    
persian_cat.jpg                  100%  351KB   3.4MB/s   00:00    
mobilenetv2-12.onnx              100%   13MB   3.6MB/s   00:03    
mobilenetv2_example              100%   12MB   3.3MB/s   00:03    
io.c                             100% 4945   428.1KB/s   00:00    
hhb_jit                          100%  322KB   3.4MB/s   00:00    
jit.o                            100%   13KB   1.1MB/s   00:00    
graph_info.bin                   100%  361    46.6KB/s   00:00    
process.h                        100% 2039   246.7KB/s   00:00    
hhb_th1520_x86_jit               100%  157KB   3.5MB/s   00:00    
model.o                          100%  194KB   3.8MB/s   00:00    
hhb.bm                           100% 3472KB   4.0MB/s   00:00    
main.o                           100%   47KB   1.8MB/s   00:00    
input.0.bin                      100%  588KB   3.7MB/s   00:00    
main.c                           100% 7404   859.9KB/s   00:00    
model.c                          100%  138KB   3.4MB/s   00:00    
process.c                        100%   20KB   2.3MB/s   00:00    
hhb_runtime                      100%  732KB   3.7MB/s   00:00    
model.params                     100% 3464KB   3.8MB/s   00:00    
hhb_th1520_x86_runtime           100%  712KB   3.9MB/s   00:00    
jit.c                            100% 1942   214.5KB/s   00:00    
io.h                             100% 1538   177.2KB/s   00:00    
input.0.tensor                   100% 2794KB   3.9MB/s   00:00    
root@9b266aa1e9ac:/home/example/th1520_npu# 
```

然后在板子中进入之前配置的Python虚拟环境在，执行那个可执行文件即可，实测记录：

![](https://xxpcb-1259761082.cos.ap-shanghai.myqcloud.com/pic2/LiPi4A/9/5.png)

# 4 MobilenertV2代码分析

## 4.1 主程序

main.c中的主程序如下，主要是三部分：

- load_image_and_preprocess加载图片，进行预处理，得到input_img.bin
- 调用hhb_runtime，传入参数hhb.bm和input_img.bin
- load_result_and_postprocess进行图片后处理

```c++
int main()
{
  std::cout << " ********** preprocess image **********" << std::endl;
  load_image_and_preprocess();

std::cout << " ********** run mobilenetv2 **********" << std::endl;
  system("./hhb_out/hhb_runtime ./hhb_out/hhb.bm input_img.bin");

std::cout << " ********** postprocess result **********" << std::endl;
  load_result_and_postprocess();

return 0;
}
```

## 4.2 图片预处理

图片预处理主要的逻辑包括：

- opencv读取jpg读取
- 图片尺寸转换
- 转为float图片
- bgr到rgb的转换
- mean与scale处理
- 保存为input_img.bin文件

```c++
void load_image_and_preprocess()
{
  // load image
  Mat origin_img = imread("persian_cat.jpg");

int image_width = 224;
  int image_height = 224;
  int image_channel = 3;

// resize image to 224 * ? or ? * 224
  cv::Mat resized_img;
  float ratio = 224 / fmin(origin_img.size().width, origin_img.size().height);
  cv::resize(origin_img, resized_img, cv::Size(), ratio, ratio);

// Crop image center 224 * 224
  int start_x = resized_img.size().width / 2 - image_width / 2;
  int start_y = resized_img.size().height / 2 - image_height / 2;
  cv::Rect crop_region(start_x, start_y, image_width, image_height);
  cv::Mat cropped_image = resized_img(crop_region);

// convert to float
  cv::Mat float_img;
  cropped_image.convertTo(float_img, CV_32F);

// bgr to rgb
  cv::cvtColor(float_img, float_img, cv::COLOR_BGR2RGB);

// mean
  float_img -= Scalar(124, 117, 104);

// scale
  float_img *= 0.017;

// save to file
  FILE* fp = fopen("input_img.tensor", "w");
  FILE* bfp = fopen("input_img.bin", "w");

float *f32_ptr = float_img.ptr<float>(0);
  float float_data[image_channel * image_width * image_height];

// layout to be CHW
  for (int k = 0; k < image_channel ; k++) {
    for (int i = 0; i < image_width * image_height; i++) {
      float point = f32_ptr[k + i * image_channel];
      fprintf(fp, "%f\n", point);
      float_data[k * image_width * image_height + i] = point;
    }
  }

fwrite(float_data, sizeof(float), image_channel * image_width * image_height, bfp);

fclose(fp);
  fclose(bfp);
}
```

## 4.3 运行mobilenetv2

代码里对应这一句：

```c++
system("./hhb_out/hhb_runtime ./hhb_out/hhb.bm input_img.bin");
```

hhb_runtime是hhb编译时生成的文件：

- hhb_runtime：生成的可执行文件
- hhb.bm：示例程序使用的模型权重
- input_img.bin：第一步预处理得到的输入数据

执行后，应该会输出input_img.bin_output0_1_1000.txt识别结果文件。

## 4.4 图片后处理

图片后处理主要的逻辑包括：

```c
static void get_top5(float *buf, uint32_t size, float *prob, uint32_t *cls);
static float* get_data_from_file(const char* filename, uint32_t size);
void load_result_and_postprocess()
{
  uint32_t i = 0, size = 1000;
  uint32_t cls[5];
  float prob[5];

float* output_data = get_data_from_file("input_img.bin_output0_1_1000.txt", 1000); //读取文件

get_top5(output_data, size, prob, cls); //top5识别率

std::ifstream infile;
  infile.open("synset.txt");
  std::vector<std::string> labels;
  std::string line;
  while (getline(infile, line))
  {
    labels.push_back(line);
  }

std::cout << " ********** probability top5: ********** " << std::endl; //打印识别率
  size = size > 5 ? 5 : size;
  for (i = 0; i < size; i++)
  {
    std::cout << labels[cls[i]] << std::endl;
  }
}
```

# 5 总结

本篇介绍了HHB工具的一些基础知识，通过实操记录测试过程，然后测试了HHB编译MobilenertV2，并在板子中的运行查看效果，最后分析Demo代码的程序逻辑。