【先楫HPM6750测评】运行边缘AI框架——TFLM基准测试

xusiwei1236 发表于 2022-6-25 23:14

本帖最后由 xusiwei1236 于 2022-6-25 23:40 编辑

<p>本篇将会介绍TFLM是什么，然后介绍TFLM官方的基准测试，以及如何在HPM6750上运行TFLM基准测试，并和树莓派3B+上的基准测试结果进行对比。</p>

<h2>TFLM是什么？</h2>

<p>你或许都听说过TensorFlow——由谷歌开发并开源的一个机器学习库，它支持模型训练和模型推理。</p>

<p>今天介绍的TFLM，全称是TensorFlow Lite for Microcontrollers，翻译过来就是“针对微控制器的TensorFlow Lite”。那TensorFlow Lite又是什么呢？</p>

<p>TensorFlow Lite（通常简称TFLite）其实是TensorFlow团队为了将模型部署到移动设备而开发的一套解决方案，通俗的说就是手机版的TensorFlow。下面是TensorFlow官网上关于TFLite的一段介绍：</p>

<blockquote>
<p>TensorFlow Lite 是一组工具，可帮助开发者在移动设备、嵌入式设备和 loT 设备上运行模型，以便实现设备端机器学习。</p>
</blockquote>

<p>而我们今天要介绍的TensorFlow Lite for Microcontrollers（TFLM）则是 TensorFlow Lite的微控制器版本。这里是官网上的一段介绍：</p>

<blockquote>
<p>TensorFlow Lite for Microcontrollers （以下简称TFLM）是 TensorFlow Lite 的一个实验性移植版本，它适用于微控制器和其他一些仅有数千字节内存的设备。它可以直接在“裸机”上运行，不需要操作系统支持、任何标准 C/C++ 库和动态内存分配。核心运行时(core runtime)在 Cortex M3 上运行时仅需 16KB，加上足以用来运行语音关键字检测模型的操作，也只需 22KB 的空间。</p>
</blockquote>

<p>这三者一脉相承，都出自谷歌，区别是TensorFlow同时支持训练和推理，而后两者只支持推理。TFLite主要用于支持手机、平板等移动设备，TFLM则可以支持单片机。从发展历程上来说，后两者都是TensorFlow项目的“支线项目”。或者说这三者是一个树形的发展过程，具体来说，TFLite是从TensorFlow项目分裂出来的，TFLite-Micro是从TFLite分裂出来的，目前是三个并行发展的。在很长一段时间内，这三个项目的源码都在一个代码仓中维护，从源码目录的包含关系上来说，TensorFlow包含后两者，TFLite包含tflite-micro。</p>

<p> </p>

<h2>TFLM开源项目</h2>

<p>2021年6月，谷歌将TFLM项目的源代码从TensorFlow主仓中转移到了一个独立的代码仓中。</p>

<p>但截至目前（2022年6月），TFLite的源代码仍然以TensorFlow项目中的一个子目录进行维护。这也可以看出谷歌对TFLM的重视。</p>

<p>TFLM代码仓链接：<a href="https://github.com/tensorflow/tflite-micro">https://github.com/tensorflow/tflite-micro</a></p>

<p>下载命令： git clone <a href="https://github.com/tensorflow/tflite-micro.git">https://github.com/tensorflow/tflite-micro.git</a></p>

<p>TFLM主要业务代码位于tensorflow\lite\micro子目录：</p>

<p></p>

<p>TFLM官方支持make和bazel构建。</p>

<p> </p>

<h3>TFLM基准测试</h3>

<p>TFLM代码仓顶层的README.md中给出了基准测试文档链接：</p>

<p><a href="https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/benchmarks/README.md">https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/benchmarks/README.md</a></p>

<p>该文档篇幅不长：</p>

<p>通过这个目录我们可以知道，TFLM提供了两个基准测试（实际有三个），分别是：</p>

<ul>
<li>关键词基准测试
<ul>
<li>关键词基准测试使用的是程序运行时生产的随机数据作为输入，所以它的输出是没有意义的</li>
</ul>
</li>
<li>人体检测基准测试
<ul>
<li>人体检测基准测试使用了两张bmp图片作为输入</li>
<li>具体位于tensorflow\lite\micro\examples\person_detection\testdata子目录</li>
</ul>
</li>
</ul>

<p> </p>

<h3>下载依赖的软件</h3>

<p>在PC的Linux系统上，运行TFLM基准测试之前，需要先安装依赖的一些工具：</p>

<pre style="background:#555; padding:10px; color:#ddd !important;">
<code>sudo apt install git unzip wget python3 python3-pip
</code></pre>

<h3>基准测试命令</h3>

<p>参考”Run on x86”，在x86 PC上运行关键词基准测试的命令是：</p>

<p><code>make -f tensorflow/lite/micro/tools/make/Makefile run_keyword_benchmark</code></p>

<p>在PC上运行人体检测基准测试的命令是：</p>

<p><code>make -f tensorflow/lite/micro/tools/make/Makefile run_person_detection_benchmark</code></p>

<p>执行这两个命令，会依次执行如下步骤：</p>

<ol>
<li>调用几个下载脚本，下载依赖库和数据集；</li>
<li>编译测试程序；</li>
<li>运行测试程序；</li>
</ol>

<p><code>tensorflow/lite/micro/tools/make/Makefile</code>代码片段中，可以看到调用了几个下载脚本：</p>

<p></p>

<p>flatbuffers_download.sh和kissfft_download.sh脚本第一次执行时，会将相应的压缩包下载到本地，并解压，具体细节参见代码内容；</p>

<p>pigweed_download.sh脚本会克隆一个代码仓，再检出一个特定版本：</p>

<p></p>

<p>这里需要注意的是，代码仓<a href="https://pigweed.googlesource.com/pigweed/pigweed">https://pigweed.googlesource.com/pigweed/pigweed</a> 国内一般无法访问（因为域名googlesource.com被禁了）。将此连接修改为我克隆好的代码仓：<a href="https://github.com/xusiwei/pigweed.git">https://github.com/xusiwei/pigweed.git</a> 可以解决因为国内无法访问googlesource.com而无法下载pigweed测试数据的问题。</p>

<p> </p>

<h3>基准测试的构建规则</h3>

<p><code>tensorflow/lite/micro/tools/make/Makefile</code>文件是Makefile总入口文件，该文件中定义了一些makefile宏函数，并通过include引入了其他文件，包括定义了两个基准测试编译规则的<code>tensorflow/lite/micro/benchmarks/Makefile.inc</code>文件：</p>

<pre style="background:#555; padding:10px; color:#ddd !important;">
<code>KEYWORD_BENCHMARK_SRCS := \\
tensorflow/lite/micro/benchmarks/keyword_benchmark.cc

KEYWORD_BENCHMARK_GENERATOR_INPUTS := \\
tensorflow/lite/micro/models/keyword_scrambled.tflite

KEYWORD_BENCHMARK_HDRS := \\
tensorflow/lite/micro/benchmarks/micro_benchmark.h

KEYWORD_BENCHMARK_8BIT_SRCS := \\
tensorflow/lite/micro/benchmarks/keyword_benchmark_8bit.cc

KEYWORD_BENCHMARK_8BIT_GENERATOR_INPUTS := \\
tensorflow/lite/micro/models/keyword_scrambled_8bit.tflite

KEYWORD_BENCHMARK_8BIT_HDRS := \\
tensorflow/lite/micro/benchmarks/micro_benchmark.h

PERSON_DETECTION_BENCHMARK_SRCS := \\
tensorflow/lite/micro/benchmarks/person_detection_benchmark.cc

PERSON_DETECTION_BENCHMARK_GENERATOR_INPUTS := \\
tensorflow/lite/micro/examples/person_detection/testdata/person.bmp \\
tensorflow/lite/micro/examples/person_detection/testdata/no_person.bmp

ifneq ($(CO_PROCESSOR),ethos_u)
PERSON_DETECTION_BENCHMARK_GENERATOR_INPUTS += \\
tensorflow/lite/micro/models/person_detect.tflite
else
# Ethos-U use a Vela optimized version of the original model.
PERSON_DETECTION_BENCHMARK_SRCS += \\
$(GENERATED_SRCS_DIR)tensorflow/lite/micro/models/person_detect_model_data_vela.cc
endif

PERSON_DETECTION_BENCHMARK_HDRS := \\
tensorflow/lite/micro/examples/person_detection/model_settings.h \\
tensorflow/lite/micro/benchmarks/micro_benchmark.h

# Builds a standalone binary.
$(eval $(call microlite_test,keyword_benchmark,\\
$(KEYWORD_BENCHMARK_SRCS),$(KEYWORD_BENCHMARK_HDRS),$(KEYWORD_BENCHMARK_GENERATOR_INPUTS)))

# Builds a standalone binary.
$(eval $(call microlite_test,keyword_benchmark_8bit,\\
$(KEYWORD_BENCHMARK_8BIT_SRCS),$(KEYWORD_BENCHMARK_8BIT_HDRS),$(KEYWORD_BENCHMARK_8BIT_GENERATOR_INPUTS)))

$(eval $(call microlite_test,person_detection_benchmark,\\
$(PERSON_DETECTION_BENCHMARK_SRCS),$(PERSON_DETECTION_BENCHMARK_HDRS),$(PERSON_DETECTION_BENCHMARK_GENERATOR_INPUTS)))
</code></pre>

<p>从这里可以看到，实际上有三个基准测试程序，比文档多了一个 keyword_benchmark_8bit ，应该是 keword_benchmark的8bit量化版本。另外，可以看到有三个tflite的模型文件。</p>

<h3>Keyword基准测试</h3>

<p>关键词基准测试使用的模型较小，比较适合在STM32 F3/F4这类主频低于100MHz的MCU。</p>

<p>这个基准测试的模型比较小，计算量也不大，所以在PC上运行这个基准测试的耗时非常短：</p>

<p>可以看到，在PC上运行关键词唤醒的速度非常快，10次时间不到1毫秒。</p>

<p>模型文件路径为：./tensorflow/lite/micro/models/keyword_scrambled.tflite</p>

<p>模型结构可以使用Netron软件查看。</p>

<p> </p>

<h3>Person detection基准测试</h3>

<p>人体检测基准测试的计算量相对要大一些，运行的时间也要长一些：</p>

<pre style="background:#555; padding:10px; color:#ddd !important;">
<code>xu@VirtualBox:~/opensource/tflite-micro$ make -f tensorflow/lite/micro/tools/make/Makefile run_person_detection_benchmark
tensorflow/lite/micro/tools/make/downloads/flatbuffers already exists, skipping the download.
tensorflow/lite/micro/tools/make/downloads/kissfft already exists, skipping the download.
tensorflow/lite/micro/tools/make/downloads/pigweed already exists, skipping the download.
g++ -std=c++11 -fno-rtti -fno-exceptions -fno-threadsafe-statics -Werror -fno-unwind-tables -ffunction-sections -fdata-sections -fmessage-length=0 -DTF_LITE_STATIC_MEMORY -DTF_LITE_DISABLE_X86_NEON -Wsign-compare -Wdouble-promotion -Wshadow -Wunused-variable -Wunused-function -Wswitch -Wvla -Wall -Wextra -Wmissing-field-initializers -Wstrict-aliasing -Wno-unused-parameter-DTF_LITE_USE_CTIME -Os -I. -Itensorflow/lite/micro/tools/make/downloads/gemmlowp -Itensorflow/lite/micro/tools/make/downloads/flatbuffers/include -Itensorflow/lite/micro/tools/make/downloads/ruy -Itensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/ -Itensorflow/lite/micro/tools/make/downloads/kissfft -c tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/examples/person_detection/testdata/person_image_data.cc -o tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/obj/core/tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/examples/person_detection/testdata/person_image_data.o
g++ -std=c++11 -fno-rtti -fno-exceptions -fno-threadsafe-statics -Werror -fno-unwind-tables -ffunction-sections -fdata-sections -fmessage-length=0 -DTF_LITE_STATIC_MEMORY -DTF_LITE_DISABLE_X86_NEON -Wsign-compare -Wdouble-promotion -Wshadow -Wunused-variable -Wunused-function -Wswitch -Wvla -Wall -Wextra -Wmissing-field-initializers -Wstrict-aliasing -Wno-unused-parameter-DTF_LITE_USE_CTIME -Os -I. -Itensorflow/lite/micro/tools/make/downloads/gemmlowp -Itensorflow/lite/micro/tools/make/downloads/flatbuffers/include -Itensorflow/lite/micro/tools/make/downloads/ruy -Itensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/ -Itensorflow/lite/micro/tools/make/downloads/kissfft -c tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/examples/person_detection/testdata/no_person_image_data.cc -o tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/obj/core/tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/examples/person_detection/testdata/no_person_image_data.o
g++ -std=c++11 -fno-rtti -fno-exceptions -fno-threadsafe-statics -Werror -fno-unwind-tables -ffunction-sections -fdata-sections -fmessage-length=0 -DTF_LITE_STATIC_MEMORY -DTF_LITE_DISABLE_X86_NEON -Wsign-compare -Wdouble-promotion -Wshadow -Wunused-variable -Wunused-function -Wswitch -Wvla -Wall -Wextra -Wmissing-field-initializers -Wstrict-aliasing -Wno-unused-parameter-DTF_LITE_USE_CTIME -Os -I. -Itensorflow/lite/micro/tools/make/downloads/gemmlowp -Itensorflow/lite/micro/tools/make/downloads/flatbuffers/include -Itensorflow/lite/micro/tools/make/downloads/ruy -Itensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/ -Itensorflow/lite/micro/tools/make/downloads/kissfft -c tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/models/person_detect_model_data.cc -o tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/obj/core/tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/models/person_detect_model_data.o
g++ -std=c++11 -fno-rtti -fno-exceptions -fno-threadsafe-statics -Werror -fno-unwind-tables -ffunction-sections -fdata-sections -fmessage-length=0 -DTF_LITE_STATIC_MEMORY -DTF_LITE_DISABLE_X86_NEON -Wsign-compare -Wdouble-promotion -Wshadow -Wunused-variable -Wunused-function -Wswitch -Wvla -Wall -Wextra -Wmissing-field-initializers -Wstrict-aliasing -Wno-unused-parameter-DTF_LITE_USE_CTIME -I. -Itensorflow/lite/micro/tools/make/downloads/gemmlowp -Itensorflow/lite/micro/tools/make/downloads/flatbuffers/include -Itensorflow/lite/micro/tools/make/downloads/ruy -Itensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/ -Itensorflow/lite/micro/tools/make/downloads/kissfft -o tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/bin/person_detection_benchmark tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/obj/core/tensorflow/lite/micro/benchmarks/person_detection_benchmark.o tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/obj/core/tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/examples/person_detection/testdata/person_image_data.o tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/obj/core/tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/examples/person_detection/testdata/no_person_image_data.o tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/obj/core/tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/models/person_detect_model_data.o tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/lib/libtensorflow-microlite.a -Wl,--fatal-warnings -Wl,--gc-sections -lm
tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/bin/person_detection_benchmark non_test_binary linux
InitializeBenchmarkRunner took 192 ticks (0 ms).

WithPersonDataIterations(1) took 32299 ticks (32 ms)
DEPTHWISE_CONV_2D took 895 ticks (0 ms).
DEPTHWISE_CONV_2D took 895 ticks (0 ms).
CONV_2D took 1801 ticks (1 ms).
DEPTHWISE_CONV_2D took 424 ticks (0 ms).
CONV_2D took 1465 ticks (1 ms).
DEPTHWISE_CONV_2D took 921 ticks (0 ms).
CONV_2D took 2725 ticks (2 ms).
DEPTHWISE_CONV_2D took 206 ticks (0 ms).
CONV_2D took 1367 ticks (1 ms).
DEPTHWISE_CONV_2D took 423 ticks (0 ms).
CONV_2D took 2540 ticks (2 ms).
DEPTHWISE_CONV_2D took 102 ticks (0 ms).
CONV_2D took 1265 ticks (1 ms).
DEPTHWISE_CONV_2D took 205 ticks (0 ms).
CONV_2D took 2449 ticks (2 ms).
DEPTHWISE_CONV_2D took 204 ticks (0 ms).
CONV_2D took 2449 ticks (2 ms).
DEPTHWISE_CONV_2D took 243 ticks (0 ms).
CONV_2D took 2483 ticks (2 ms).
DEPTHWISE_CONV_2D took 202 ticks (0 ms).
CONV_2D took 2481 ticks (2 ms).
DEPTHWISE_CONV_2D took 203 ticks (0 ms).
CONV_2D took 2489 ticks (2 ms).
DEPTHWISE_CONV_2D took 52 ticks (0 ms).
CONV_2D took 1222 ticks (1 ms).
DEPTHWISE_CONV_2D took 90 ticks (0 ms).
CONV_2D took 2485 ticks (2 ms).
AVERAGE_POOL_2D took 8 ticks (0 ms).
CONV_2D took 3 ticks (0 ms).
RESHAPE took 0 ticks (0 ms).
SOFTMAX took 2 ticks (0 ms).

NoPersonDataIterations(1) took 32148 ticks (32 ms)
DEPTHWISE_CONV_2D took 906 ticks (0 ms).
DEPTHWISE_CONV_2D took 924 ticks (0 ms).
CONV_2D took 1762 ticks (1 ms).
DEPTHWISE_CONV_2D took 446 ticks (0 ms).
CONV_2D took 1466 ticks (1 ms).
DEPTHWISE_CONV_2D took 897 ticks (0 ms).
CONV_2D took 2692 ticks (2 ms).
DEPTHWISE_CONV_2D took 209 ticks (0 ms).
CONV_2D took 1366 ticks (1 ms).
DEPTHWISE_CONV_2D took 427 ticks (0 ms).
CONV_2D took 2548 ticks (2 ms).
DEPTHWISE_CONV_2D took 102 ticks (0 ms).
CONV_2D took 1258 ticks (1 ms).
DEPTHWISE_CONV_2D took 208 ticks (0 ms).
CONV_2D took 2473 ticks (2 ms).
DEPTHWISE_CONV_2D took 210 ticks (0 ms).
CONV_2D took 2460 ticks (2 ms).
DEPTHWISE_CONV_2D took 203 ticks (0 ms).
CONV_2D took 2461 ticks (2 ms).
DEPTHWISE_CONV_2D took 230 ticks (0 ms).
CONV_2D took 2443 ticks (2 ms).
DEPTHWISE_CONV_2D took 203 ticks (0 ms).
CONV_2D took 2467 ticks (2 ms).
DEPTHWISE_CONV_2D took 51 ticks (0 ms).
CONV_2D took 1224 ticks (1 ms).
DEPTHWISE_CONV_2D took 89 ticks (0 ms).
CONV_2D took 2412 ticks (2 ms).
AVERAGE_POOL_2D took 7 ticks (0 ms).
CONV_2D took 2 ticks (0 ms).
RESHAPE took 0 ticks (0 ms).
SOFTMAX took 2 ticks (0 ms).

WithPersonDataIterations(10) took 326947 ticks (326 ms)

NoPersonDataIterations(10) took 352888 ticks (352 ms)
</code></pre>

<p>可以看到，人像检测模型运行10次的时间是三百多毫秒，一次平均三十几毫秒。这是在配备AMD标压R7 4800 CPU的Win10虚拟机下运行的结果。</p>

<p>模型文件路径为：./tensorflow/lite/micro/models/person_detect.tflite</p>

<p>同样，可以使用Netron查看模型结构。</p>

<p> </p>

<h2>HPM SDK中的TFLM</h2>

<h3>TFLM中间件</h3>

<p>HPM SDK中集成了TFLM中间件（类似库，但是没有单独编译为库），位于hpm_sdk\middleware子目录：</p>

<p></p>

<p>这个子目录的代码是由TFLM开源项目裁剪而来，删除了很多不需要的文件。</p>

<p> </p>

<h3>TFLM 示例</h3>

<p>HPM SDK中也提供了TFLM示例，位于hpm_sdk\samples\tflm子目录：</p>

<p></p>

<p>示例代码是从官方的persion_detection示例修改而来，添加了摄像头采集图像和LCD显示结果。</p>

<p>由于我手里没有配套的摄像头和显示屏，所以本篇没有以这个示例作为实验。</p>

<p> </p>

<h2>在HPM6750上运行TFLM基准测试</h2>

<p>接下来以person detection benchmark为例，讲解如何在HPM6750上运行TFLM基准测试。</p>

<h3>将person detection benchmark源代码添加到HPM SDK环境</h3>

<p>按照如下步骤，在HPM SDK环境中添加person detection benchmark源代码文件：</p>

<ol>
<li>在HPM SDK的samples子目录创建tflm_person_detect_benchmark目录，并在其中创建src目录；</li>
<li>从上文描述的已经运行过person detection benchmark的tflite-micro目录中拷贝如下文件到src目录：
<ol>
<li>tensorflow\lite\micro\benchmarks\person_detection_benchmark.cc</li>
<li>tensorflow\lite\micro\benchmarks\micro_benchmark.h</li>
<li>tensorflow\lite\micro\examples\person_detection\model_settings.h</li>
<li>tensorflow\lite\micro\examples\person_detection\model_settings.cc</li>
</ol>
</li>
<li>在src目录创建testdata子目录，并将tflite-micro目录下如下目录中的文件拷贝全部到testdata中：
<ol>
<li>tensorflow\lite\micro\tools\make\gen\linux_x86_64_default\genfiles\tensorflow\lite\micro\examples\person_detection\testdata</li>
</ol>
</li>
<li>修改person_detection_benchmark.cc、model_settings.cc、no_person_image_data.cc、person_image_data.cc 文件中部分#include预处理指令的文件路径（根据拷贝后的相对路径修改）；</li>
<li>person_detection_benchmark.cc文件中，main函数的一开始添加一行board_init();、顶部添加一行#include "board.h”</li>
</ol>

<h3>添加CMakeLists.txt和app.yaml文件</h3>

<p>在src平级创建CMakeLists.txt文件，内容如下：</p>

<pre style="background:#555; padding:10px; color:#ddd !important;">
<code>cmake_minimum_required(VERSION 3.13)

set(CONFIG_TFLM 1)

find_package(hpm-sdk REQUIRED HINTS $ENV{HPM_SDK_BASE})
project(tflm_person_detect_benchmark)
set(CMAKE_CXX_STANDARD 11)

sdk_app_src(src/model_settings.cc)
sdk_app_src(src/person_detection_benchmark.cc)
sdk_app_src(src/testdata/no_person_image_data.cc)
sdk_app_src(src/testdata/person_image_data.cc)

sdk_app_inc(src)
sdk_ld_options("-lm")
sdk_ld_options("--std=c++11")
sdk_compile_definitions(__HPMICRO__)
sdk_compile_definitions(-DINIT_EXT_RAM_FOR_DATA=1)
# sdk_compile_options("-mabi=ilp32f")
# sdk_compile_options("-march=rv32imafc")
sdk_compile_options("-O2")
# sdk_compile_options("-O3")
set(SEGGER_LEVEL_O3 1)
generate_ses_project()
</code></pre>

<p>在src平级创建app.yaml文件，内容如下：</p>

<pre style="background:#555; padding:10px; color:#ddd !important;">
<code>dependency:
- tflm
</code></pre>

<h3>编译和运行TFLM基准测试</h3>

<p>接下来就是大家熟悉的——编译运行了。</p>

<p>首先，使用generate_project生产项目：</p>

<p></p>

<p>接着，将HPM6750开发板连接到PC，在Embedded Studio中打卡刚刚生产的项目：</p>

<p>这个项目因为引入了TFLM的源码，文件较多，所以右边的源码导航窗里面的Indexing要执行很久才能结束。</p>

<p>然后，就可以使用F7编译、F5调试项目了：</p>

<p></p>

<p>编译完成后，先打卡串口终端连接到设备串口，波特率115200。启动调试后，直接继续运行，就可以在串口终端中看到基准测试的输出了：</p>

<pre style="background:#555; padding:10px; color:#ddd !important;">
<code>==============================
hpm6750evkmini clock summary
==============================
cpu0:          816000000Hz
cpu1:          816000000Hz
axi0:          200000000Hz
axi1:          200000000Hz
axi2:          200000000Hz
ahb:          200000000Hz
mchtmr0:       24000000Hz
mchtmr1:       1000000Hz
xpi0:          133333333Hz
xpi1:          400000000Hz
dram:          166666666Hz
display:       74250000Hz
cam0:          59400000Hz
cam1:          59400000Hz
jpeg:          200000000Hz
pdma:          200000000Hz
==============================

----------------------------------------------------------------------
$$\\ $$\\ $$$$$$$\\$$\\    $$\\ $$\\
$$ |$$ |$$__$$\\ $$$\\ $$$ |\\__|
$$ |$$ |$$ |$$ |$$$$\\$$$$ |$$\\$$$$$$$\\$$$$$$\\ $$$$$$\\
$$$$$$$$ |$$$$$$$|$$\\$$\\$$ $$ |$$ |$$_____|$$__$$\\ $$__$$\\
$$__$$ |$$____/ $$ \\$$$$$ |$$ |$$ /    $$ |\\__|$$ /$$ |
$$ |$$ |$$ |    $$ |\\$/$$ |$$ |$$ |    $$ |    $$ |$$ |
$$ |$$ |$$ |    $$ | \\_/ $$ |$$ |\\$$$$$$$\\ $$ |    \\$$$$$$|
\\__|\\__|\\__|    \\__| \\__|\\__| \\_______|\\__|    \\______/
----------------------------------------------------------------------
InitializeBenchmarkRunner took 114969 ticks (4 ms).

WithPersonDataIterations(1) took 10694521 ticks (445 ms)
DEPTHWISE_CONV_2D took 275798 ticks (11 ms).
DEPTHWISE_CONV_2D took 280579 ticks (11 ms).
CONV_2D took 516051 ticks (21 ms).
DEPTHWISE_CONV_2D took 139000 ticks (5 ms).
CONV_2D took 459646 ticks (19 ms).
DEPTHWISE_CONV_2D took 274903 ticks (11 ms).
CONV_2D took 868518 ticks (36 ms).
DEPTHWISE_CONV_2D took 68180 ticks (2 ms).
CONV_2D took 434392 ticks (18 ms).
DEPTHWISE_CONV_2D took 132918 ticks (5 ms).
CONV_2D took 843014 ticks (35 ms).
DEPTHWISE_CONV_2D took 33228 ticks (1 ms).
CONV_2D took 423288 ticks (17 ms).
DEPTHWISE_CONV_2D took 62040 ticks (2 ms).
CONV_2D took 833033 ticks (34 ms).
DEPTHWISE_CONV_2D took 62198 ticks (2 ms).
CONV_2D took 834644 ticks (34 ms).
DEPTHWISE_CONV_2D took 62176 ticks (2 ms).
CONV_2D took 838212 ticks (34 ms).
DEPTHWISE_CONV_2D took 62206 ticks (2 ms).
CONV_2D took 832857 ticks (34 ms).
DEPTHWISE_CONV_2D took 62194 ticks (2 ms).
CONV_2D took 832882 ticks (34 ms).
DEPTHWISE_CONV_2D took 16050 ticks (0 ms).
CONV_2D took 438774 ticks (18 ms).
DEPTHWISE_CONV_2D took 27494 ticks (1 ms).
CONV_2D took 974362 ticks (40 ms).
AVERAGE_POOL_2D took 2323 ticks (0 ms).
CONV_2D took 1128 ticks (0 ms).
RESHAPE took 184 ticks (0 ms).
SOFTMAX took 2249 ticks (0 ms).

NoPersonDataIterations(1) took 10694160 ticks (445 ms)
DEPTHWISE_CONV_2D took 274922 ticks (11 ms).
DEPTHWISE_CONV_2D took 281095 ticks (11 ms).
CONV_2D took 515380 ticks (21 ms).
DEPTHWISE_CONV_2D took 139428 ticks (5 ms).
CONV_2D took 460039 ticks (19 ms).
DEPTHWISE_CONV_2D took 275255 ticks (11 ms).
CONV_2D took 868787 ticks (36 ms).
DEPTHWISE_CONV_2D took 68384 ticks (2 ms).
CONV_2D took 434537 ticks (18 ms).
DEPTHWISE_CONV_2D took 133071 ticks (5 ms).
CONV_2D took 843202 ticks (35 ms).
DEPTHWISE_CONV_2D took 33291 ticks (1 ms).
CONV_2D took 423388 ticks (17 ms).
DEPTHWISE_CONV_2D took 62190 ticks (2 ms).
CONV_2D took 832978 ticks (34 ms).
DEPTHWISE_CONV_2D took 62205 ticks (2 ms).
CONV_2D took 834636 ticks (34 ms).
DEPTHWISE_CONV_2D took 62213 ticks (2 ms).
CONV_2D took 838212 ticks (34 ms).
DEPTHWISE_CONV_2D took 62239 ticks (2 ms).
CONV_2D took 832850 ticks (34 ms).
DEPTHWISE_CONV_2D took 62217 ticks (2 ms).
CONV_2D took 832856 ticks (34 ms).
DEPTHWISE_CONV_2D took 16040 ticks (0 ms).
CONV_2D took 438779 ticks (18 ms).
DEPTHWISE_CONV_2D took 27481 ticks (1 ms).
CONV_2D took 974354 ticks (40 ms).
AVERAGE_POOL_2D took 1812 ticks (0 ms).
CONV_2D took 1077 ticks (0 ms).
RESHAPE took 341 ticks (0 ms).
SOFTMAX took 901 ticks (0 ms).

WithPersonDataIterations(10) took 106960312 ticks (4456 ms)

NoPersonDataIterations(10) took 106964554 ticks (4456 ms)
</code></pre>

<p>可以看到，在HPM6750EVKMINI开发板上，连续运行10次人像检测模型，总体耗时4456毫秒，每次平均耗时445.6毫秒。</p>

<p> </p>

<h2>在树莓派3B+上运行TFLM基准测试</h2>

<h3>在树莓派上运行TFLM基准测试</h3>

<p>树莓派3B+上可以和PC上类似，下载源码后，直接运行PC端的make命令：</p>

<p>make -f tensorflow/lite/micro/tools/make/Makefile</p>

<p>一段时间后，即可得到基准测试结果：</p>

<p>可以看到，在树莓派3B+上的，对于有人脸的图片，连续运行10次人脸检测模型，总体耗时4186毫秒，每次平均耗时418.6毫秒；对于无人脸的图片，连续运行10次人脸检测模型，耗时4190毫秒，每次平均耗时419毫秒。</p>

<p> </p>

<h3>HPM6750和AMD R7 4800H、树莓派3B+的基准测试结果对比</h3>

<p>这里将HPM6750EVKMINI开发板、树莓派3B+和AMD R7 4800H上运行人脸检测模型的平均耗时结果汇总如下：</p>

<table>
<thead>
<tr>
<th> </th>
<th>树莓派3B+</th>
<th>HPM6750EVKMINI</th>
<th>AMD R7 4800H</th>
</tr>
</thead>
<tbody>
<tr>
<td>有人脸平均耗时（ms）</td>
<td>418.6</td>
<td>445.6</td>
<td>32.6</td>
</tr>
<tr>
<td>无人脸平均耗时（ms）</td>
<td>419</td>
<td>445.6</td>
<td>35.2</td>
</tr>
<tr>
<td>CPU最高主频（Hz）</td>
<td>1.4G</td>
<td>816M</td>
<td>4.2G</td>
</tr>
</tbody>
</table>

<p>可以看到，在TFLM人脸检测模型计算场景下，HPM6750EVKMINI和树莓派3B+成绩相当。虽然HPM6750的816MHz CPU频率比树莓派3B+搭载的BCM2837 Cortex-A53 1.4GHz的主频低，但是在单核心计算能力上平没有相差太多。</p>

<p>这里树莓派3B+上的TFLM基准测试程序是运行在64位Debian Linux发行版上的，而HPM6750上的测试程序是直接运行在裸机上的。由于操作系统内核中任务调度器的存在，会对CPU的计算能力带来一定损耗。所以，这里进行的并不是一个严格意义上的对比测试，测试结果仅供参考。</p>

<p> </p>

<h2>参考连接</h2>

<p>更多内容可以参考TFLM官网和项目源码。</p>

<ol>
<li>TFLite指南：<a href="https://tensorflow.google.cn/lite/guide?hl=zh-cn">https://tensorflow.google.cn/lite/guide?hl=zh-cn</a></li>
<li>TFLM介绍：<a href="https://tensorflow.google.cn/lite/microcontrollers/overview?hl=zh-cn">https://tensorflow.google.cn/lite/microcontrollers/overview?hl=zh-cn</a></li>
<li>TensorFlow官网：<a href="https://tensorflow.google.cn/">https://tensorflow.google.cn/</a></li>
</ol>

Jacktang 发表于 2022-6-26 07:13

<p>跟楼主学习的TFLM，感觉谷歌还是比较牛叉的<br />
并了解了TFLM基准测试，看了楼主的在树莓派上运行TFLM基准测试，虽然在单核心计算能力上平没有相差太多，但多核时代会有差异的</p>

页: [1]

电子工程世界-论坛's Archiver

【先楫HPM6750测评】运行边缘AI框架——TFLM基准测试