xusiwei1236 发表于 2022-6-25 23:14

【先楫HPM6750测评】运行边缘AI框架——TFLM基准测试

本帖最后由 xusiwei1236 于 2022-6-25 23:40 编辑

<p>本篇将会介绍TFLM是什么,然后介绍TFLM官方的基准测试,以及如何在HPM6750上运行TFLM基准测试,并和树莓派3B+上的基准测试结果进行对比。</p>

<h2>TFLM是什么?</h2>

<p>你或许都听说过TensorFlow&mdash;&mdash;由谷歌开发并开源的一个机器学习库,它支持模型训练和模型推理。</p>

<p>今天介绍的TFLM,全称是TensorFlow Lite for Microcontrollers,翻译过来就是&ldquo;针对微控制器的TensorFlow Lite&rdquo;。那TensorFlow Lite又是什么呢?</p>

<p>TensorFlow Lite(通常简称TFLite)其实是TensorFlow团队为了将模型部署到移动设备而开发的一套解决方案,通俗的说就是手机版的TensorFlow。下面是TensorFlow官网上关于TFLite的一段介绍:</p>

<blockquote>
<p>TensorFlow Lite 是一组工具,可帮助开发者在移动设备、嵌入式设备和 loT 设备上运行模型,以便实现设备端机器学习。</p>
</blockquote>

<p>而我们今天要介绍的TensorFlow Lite for Microcontrollers(TFLM)则是 TensorFlow Lite的微控制器版本。这里是官网上的一段介绍:</p>

<blockquote>
<p>TensorFlow Lite for Microcontrollers (以下简称TFLM)是 TensorFlow Lite 的一个实验性移植版本,它适用于微控制器和其他一些仅有数千字节内存的设备。 它可以直接在&ldquo;裸机&rdquo;上运行,不需要操作系统支持、任何标准 C/C++ 库和动态内存分配。核心运行时(core runtime)在 Cortex M3 上运行时仅需 16KB,加上足以用来运行语音关键字检测模型的操作,也只需 22KB 的空间。</p>
</blockquote>

<p>这三者一脉相承,都出自谷歌,区别是TensorFlow同时支持训练和推理,而后两者只支持推理。TFLite主要用于支持手机、平板等移动设备,TFLM则可以支持单片机。从发展历程上来说,后两者都是TensorFlow项目的&ldquo;支线项目&rdquo;。或者说这三者是一个树形的发展过程,具体来说,TFLite是从TensorFlow项目分裂出来的,TFLite-Micro是从TFLite分裂出来的,目前是三个并行发展的。在很长一段时间内,这三个项目的源码都在一个代码仓中维护,从源码目录的包含关系上来说,TensorFlow包含后两者,TFLite包含tflite-micro。</p>

<p>&nbsp;</p>

<h2>TFLM开源项目</h2>

<p>2021年6月,谷歌将TFLM项目的源代码从TensorFlow主仓中转移到了一个独立的代码仓中。</p>

<p>但截至目前(2022年6月),TFLite的源代码仍然以TensorFlow项目中的一个子目录进行维护。这也可以看出谷歌对TFLM的重视。</p>

<p>TFLM代码仓链接:<a href="https://github.com/tensorflow/tflite-micro">https://github.com/tensorflow/tflite-micro</a></p>

<p>下载命令: git clone <a href="https://github.com/tensorflow/tflite-micro.git">https://github.com/tensorflow/tflite-micro.git</a></p>

<p>TFLM主要业务代码位于tensorflow\lite\micro子目录:</p>

<p></p>

<p>TFLM官方支持make和bazel构建。</p>

<p>&nbsp;</p>

<h3>TFLM基准测试</h3>

<p>TFLM代码仓顶层的README.md中给出了基准测试文档链接:</p>

<p><a href="https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/benchmarks/README.md">https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/benchmarks/README.md</a></p>

<p>该文档篇幅不长:</p>

<p>通过这个目录我们可以知道,TFLM提供了两个基准测试(实际有三个),分别是:</p>

<ul>
        <li>关键词基准测试
        <ul>
                <li>关键词基准测试使用的是程序运行时生产的随机数据作为输入,所以它的输出是没有意义的</li>
        </ul>
        </li>
        <li>人体检测基准测试
        <ul>
                <li>人体检测基准测试使用了两张bmp图片作为输入</li>
                <li>具体位于tensorflow\lite\micro\examples\person_detection\testdata子目录</li>
        </ul>
        </li>
</ul>

<p>&nbsp;</p>

<h3>下载依赖的软件</h3>

<p>在PC的Linux系统上,运行TFLM基准测试之前,需要先安装依赖的一些工具:</p>

<pre style="background:#555; padding:10px; color:#ddd !important;">
<code>sudo apt install git unzip wget python3 python3-pip
</code></pre>

<h3>基准测试命令</h3>

<p>参考&rdquo;Run on x86&rdquo;,在x86 PC上运行关键词基准测试的命令是:</p>

<p><code>make -f tensorflow/lite/micro/tools/make/Makefile run_keyword_benchmark</code></p>

<p>在PC上运行人体检测基准测试的命令是:</p>

<p><code>make -f tensorflow/lite/micro/tools/make/Makefile run_person_detection_benchmark</code></p>

<p>执行这两个命令,会依次执行如下步骤:</p>

<ol>
        <li>调用几个下载脚本,下载依赖库和数据集;</li>
        <li>编译测试程序;</li>
        <li>运行测试程序;</li>
</ol>

<p><code>tensorflow/lite/micro/tools/make/Makefile</code>代码片段中,可以看到调用了几个下载脚本:</p>

<p></p>

<p>flatbuffers_download.sh和kissfft_download.sh脚本第一次执行时,会将相应的压缩包下载到本地,并解压,具体细节参见代码内容;</p>

<p>pigweed_download.sh脚本会克隆一个代码仓,再检出一个特定版本:</p>

<p></p>

<p>这里需要注意的是,代码仓<a href="https://pigweed.googlesource.com/pigweed/pigweed">https://pigweed.googlesource.com/pigweed/pigweed</a> 国内一般无法访问(因为域名googlesource.com被禁了)。将此连接修改为我克隆好的代码仓:<a href="https://github.com/xusiwei/pigweed.git">https://github.com/xusiwei/pigweed.git</a> 可以解决因为国内无法访问googlesource.com而无法下载pigweed测试数据的问题。</p>

<p>&nbsp;</p>

<h3>基准测试的构建规则</h3>

<p><code>tensorflow/lite/micro/tools/make/Makefile</code>文件是Makefile总入口文件,该文件中定义了一些makefile宏函数,并通过include引入了其他文件,包括定义了两个基准测试编译规则的<code>tensorflow/lite/micro/benchmarks/Makefile.inc</code>文件:</p>

<pre style="background:#555; padding:10px; color:#ddd !important;">
<code>KEYWORD_BENCHMARK_SRCS := \\
tensorflow/lite/micro/benchmarks/keyword_benchmark.cc

KEYWORD_BENCHMARK_GENERATOR_INPUTS := \\
tensorflow/lite/micro/models/keyword_scrambled.tflite

KEYWORD_BENCHMARK_HDRS := \\
tensorflow/lite/micro/benchmarks/micro_benchmark.h

KEYWORD_BENCHMARK_8BIT_SRCS := \\
tensorflow/lite/micro/benchmarks/keyword_benchmark_8bit.cc

KEYWORD_BENCHMARK_8BIT_GENERATOR_INPUTS := \\
tensorflow/lite/micro/models/keyword_scrambled_8bit.tflite

KEYWORD_BENCHMARK_8BIT_HDRS := \\
tensorflow/lite/micro/benchmarks/micro_benchmark.h

PERSON_DETECTION_BENCHMARK_SRCS := \\
tensorflow/lite/micro/benchmarks/person_detection_benchmark.cc

PERSON_DETECTION_BENCHMARK_GENERATOR_INPUTS := \\
tensorflow/lite/micro/examples/person_detection/testdata/person.bmp \\
tensorflow/lite/micro/examples/person_detection/testdata/no_person.bmp

ifneq ($(CO_PROCESSOR),ethos_u)
PERSON_DETECTION_BENCHMARK_GENERATOR_INPUTS += \\
    tensorflow/lite/micro/models/person_detect.tflite
else
# Ethos-U use a Vela optimized version of the original model.
PERSON_DETECTION_BENCHMARK_SRCS += \\
$(GENERATED_SRCS_DIR)tensorflow/lite/micro/models/person_detect_model_data_vela.cc
endif

PERSON_DETECTION_BENCHMARK_HDRS := \\
tensorflow/lite/micro/examples/person_detection/model_settings.h \\
tensorflow/lite/micro/benchmarks/micro_benchmark.h

# Builds a standalone binary.
$(eval $(call microlite_test,keyword_benchmark,\\
$(KEYWORD_BENCHMARK_SRCS),$(KEYWORD_BENCHMARK_HDRS),$(KEYWORD_BENCHMARK_GENERATOR_INPUTS)))

# Builds a standalone binary.
$(eval $(call microlite_test,keyword_benchmark_8bit,\\
$(KEYWORD_BENCHMARK_8BIT_SRCS),$(KEYWORD_BENCHMARK_8BIT_HDRS),$(KEYWORD_BENCHMARK_8BIT_GENERATOR_INPUTS)))

$(eval $(call microlite_test,person_detection_benchmark,\\
$(PERSON_DETECTION_BENCHMARK_SRCS),$(PERSON_DETECTION_BENCHMARK_HDRS),$(PERSON_DETECTION_BENCHMARK_GENERATOR_INPUTS)))
</code></pre>

<p>从这里可以看到,实际上有三个基准测试程序,比文档多了一个 keyword_benchmark_8bit ,应该是 keword_benchmark的8bit量化版本。另外,可以看到有三个tflite的模型文件。</p>

<h3>Keyword基准测试</h3>

<p>关键词基准测试使用的模型较小,比较适合在STM32 F3/F4这类主频低于100MHz的MCU。</p>

<p>这个基准测试的模型比较小,计算量也不大,所以在PC上运行这个基准测试的耗时非常短:</p>

<p>可以看到,在PC上运行关键词唤醒的速度非常快,10次时间不到1毫秒。</p>

<p>模型文件路径为:./tensorflow/lite/micro/models/keyword_scrambled.tflite</p>

<p>模型结构可以使用Netron软件查看。</p>

<p>&nbsp;</p>

<h3>Person detection基准测试</h3>

<p>人体检测基准测试的计算量相对要大一些,运行的时间也要长一些:</p>

<pre style="background:#555; padding:10px; color:#ddd !important;">
<code>xu@VirtualBox:~/opensource/tflite-micro$ make -f tensorflow/lite/micro/tools/make/Makefile run_person_detection_benchmark
tensorflow/lite/micro/tools/make/downloads/flatbuffers already exists, skipping the download.
tensorflow/lite/micro/tools/make/downloads/kissfft already exists, skipping the download.
tensorflow/lite/micro/tools/make/downloads/pigweed already exists, skipping the download.
g++ -std=c++11 -fno-rtti -fno-exceptions -fno-threadsafe-statics -Werror -fno-unwind-tables -ffunction-sections -fdata-sections -fmessage-length=0 -DTF_LITE_STATIC_MEMORY -DTF_LITE_DISABLE_X86_NEON -Wsign-compare -Wdouble-promotion -Wshadow -Wunused-variable -Wunused-function -Wswitch -Wvla -Wall -Wextra -Wmissing-field-initializers -Wstrict-aliasing -Wno-unused-parameter-DTF_LITE_USE_CTIME -Os -I. -Itensorflow/lite/micro/tools/make/downloads/gemmlowp -Itensorflow/lite/micro/tools/make/downloads/flatbuffers/include -Itensorflow/lite/micro/tools/make/downloads/ruy -Itensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/ -Itensorflow/lite/micro/tools/make/downloads/kissfft -c tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/examples/person_detection/testdata/person_image_data.cc -o tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/obj/core/tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/examples/person_detection/testdata/person_image_data.o
g++ -std=c++11 -fno-rtti -fno-exceptions -fno-threadsafe-statics -Werror -fno-unwind-tables -ffunction-sections -fdata-sections -fmessage-length=0 -DTF_LITE_STATIC_MEMORY -DTF_LITE_DISABLE_X86_NEON -Wsign-compare -Wdouble-promotion -Wshadow -Wunused-variable -Wunused-function -Wswitch -Wvla -Wall -Wextra -Wmissing-field-initializers -Wstrict-aliasing -Wno-unused-parameter-DTF_LITE_USE_CTIME -Os -I. -Itensorflow/lite/micro/tools/make/downloads/gemmlowp -Itensorflow/lite/micro/tools/make/downloads/flatbuffers/include -Itensorflow/lite/micro/tools/make/downloads/ruy -Itensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/ -Itensorflow/lite/micro/tools/make/downloads/kissfft -c tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/examples/person_detection/testdata/no_person_image_data.cc -o tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/obj/core/tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/examples/person_detection/testdata/no_person_image_data.o
g++ -std=c++11 -fno-rtti -fno-exceptions -fno-threadsafe-statics -Werror -fno-unwind-tables -ffunction-sections -fdata-sections -fmessage-length=0 -DTF_LITE_STATIC_MEMORY -DTF_LITE_DISABLE_X86_NEON -Wsign-compare -Wdouble-promotion -Wshadow -Wunused-variable -Wunused-function -Wswitch -Wvla -Wall -Wextra -Wmissing-field-initializers -Wstrict-aliasing -Wno-unused-parameter-DTF_LITE_USE_CTIME -Os -I. -Itensorflow/lite/micro/tools/make/downloads/gemmlowp -Itensorflow/lite/micro/tools/make/downloads/flatbuffers/include -Itensorflow/lite/micro/tools/make/downloads/ruy -Itensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/ -Itensorflow/lite/micro/tools/make/downloads/kissfft -c tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/models/person_detect_model_data.cc -o tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/obj/core/tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/models/person_detect_model_data.o
g++ -std=c++11 -fno-rtti -fno-exceptions -fno-threadsafe-statics -Werror -fno-unwind-tables -ffunction-sections -fdata-sections -fmessage-length=0 -DTF_LITE_STATIC_MEMORY -DTF_LITE_DISABLE_X86_NEON -Wsign-compare -Wdouble-promotion -Wshadow -Wunused-variable -Wunused-function -Wswitch -Wvla -Wall -Wextra -Wmissing-field-initializers -Wstrict-aliasing -Wno-unused-parameter-DTF_LITE_USE_CTIME -I. -Itensorflow/lite/micro/tools/make/downloads/gemmlowp -Itensorflow/lite/micro/tools/make/downloads/flatbuffers/include -Itensorflow/lite/micro/tools/make/downloads/ruy -Itensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/ -Itensorflow/lite/micro/tools/make/downloads/kissfft -o tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/bin/person_detection_benchmark tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/obj/core/tensorflow/lite/micro/benchmarks/person_detection_benchmark.o tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/obj/core/tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/examples/person_detection/testdata/person_image_data.o tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/obj/core/tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/examples/person_detection/testdata/no_person_image_data.o tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/obj/core/tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/models/person_detect_model_data.o tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/lib/libtensorflow-microlite.a -Wl,--fatal-warnings -Wl,--gc-sections -lm
tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/bin/person_detection_benchmark non_test_binary linux
InitializeBenchmarkRunner took 192 ticks (0 ms).

WithPersonDataIterations(1) took 32299 ticks (32 ms)
DEPTHWISE_CONV_2D took 895 ticks (0 ms).
DEPTHWISE_CONV_2D took 895 ticks (0 ms).
CONV_2D took 1801 ticks (1 ms).
DEPTHWISE_CONV_2D took 424 ticks (0 ms).
CONV_2D took 1465 ticks (1 ms).
DEPTHWISE_CONV_2D took 921 ticks (0 ms).
CONV_2D took 2725 ticks (2 ms).
DEPTHWISE_CONV_2D took 206 ticks (0 ms).
CONV_2D took 1367 ticks (1 ms).
DEPTHWISE_CONV_2D took 423 ticks (0 ms).
CONV_2D took 2540 ticks (2 ms).
DEPTHWISE_CONV_2D took 102 ticks (0 ms).
CONV_2D took 1265 ticks (1 ms).
DEPTHWISE_CONV_2D took 205 ticks (0 ms).
CONV_2D took 2449 ticks (2 ms).
DEPTHWISE_CONV_2D took 204 ticks (0 ms).
CONV_2D took 2449 ticks (2 ms).
DEPTHWISE_CONV_2D took 243 ticks (0 ms).
CONV_2D took 2483 ticks (2 ms).
DEPTHWISE_CONV_2D took 202 ticks (0 ms).
CONV_2D took 2481 ticks (2 ms).
DEPTHWISE_CONV_2D took 203 ticks (0 ms).
CONV_2D took 2489 ticks (2 ms).
DEPTHWISE_CONV_2D took 52 ticks (0 ms).
CONV_2D took 1222 ticks (1 ms).
DEPTHWISE_CONV_2D took 90 ticks (0 ms).
CONV_2D took 2485 ticks (2 ms).
AVERAGE_POOL_2D took 8 ticks (0 ms).
CONV_2D took 3 ticks (0 ms).
RESHAPE took 0 ticks (0 ms).
SOFTMAX took 2 ticks (0 ms).

NoPersonDataIterations(1) took 32148 ticks (32 ms)
DEPTHWISE_CONV_2D took 906 ticks (0 ms).
DEPTHWISE_CONV_2D took 924 ticks (0 ms).
CONV_2D took 1762 ticks (1 ms).
DEPTHWISE_CONV_2D took 446 ticks (0 ms).
CONV_2D took 1466 ticks (1 ms).
DEPTHWISE_CONV_2D took 897 ticks (0 ms).
CONV_2D took 2692 ticks (2 ms).
DEPTHWISE_CONV_2D took 209 ticks (0 ms).
CONV_2D took 1366 ticks (1 ms).
DEPTHWISE_CONV_2D took 427 ticks (0 ms).
CONV_2D took 2548 ticks (2 ms).
DEPTHWISE_CONV_2D took 102 ticks (0 ms).
CONV_2D took 1258 ticks (1 ms).
DEPTHWISE_CONV_2D took 208 ticks (0 ms).
CONV_2D took 2473 ticks (2 ms).
DEPTHWISE_CONV_2D took 210 ticks (0 ms).
CONV_2D took 2460 ticks (2 ms).
DEPTHWISE_CONV_2D took 203 ticks (0 ms).
CONV_2D took 2461 ticks (2 ms).
DEPTHWISE_CONV_2D took 230 ticks (0 ms).
CONV_2D took 2443 ticks (2 ms).
DEPTHWISE_CONV_2D took 203 ticks (0 ms).
CONV_2D took 2467 ticks (2 ms).
DEPTHWISE_CONV_2D took 51 ticks (0 ms).
CONV_2D took 1224 ticks (1 ms).
DEPTHWISE_CONV_2D took 89 ticks (0 ms).
CONV_2D took 2412 ticks (2 ms).
AVERAGE_POOL_2D took 7 ticks (0 ms).
CONV_2D took 2 ticks (0 ms).
RESHAPE took 0 ticks (0 ms).
SOFTMAX took 2 ticks (0 ms).

WithPersonDataIterations(10) took 326947 ticks (326 ms)

NoPersonDataIterations(10) took 352888 ticks (352 ms)
</code></pre>

<p>可以看到,人像检测模型运行10次的时间是三百多毫秒,一次平均三十几毫秒。这是在配备AMD标压R7 4800 CPU的Win10虚拟机下运行的结果。</p>

<p>模型文件路径为:./tensorflow/lite/micro/models/person_detect.tflite</p>

<p>同样,可以使用Netron查看模型结构。</p>

<p>&nbsp;</p>

<h2>HPM SDK中的TFLM</h2>

<h3>TFLM中间件</h3>

<p>HPM SDK中集成了TFLM中间件(类似库,但是没有单独编译为库),位于hpm_sdk\middleware子目录:</p>

<p></p>

<p>这个子目录的代码是由TFLM开源项目裁剪而来,删除了很多不需要的文件。</p>

<p>&nbsp;</p>

<h3>TFLM 示例</h3>

<p>HPM SDK中也提供了TFLM示例,位于hpm_sdk\samples\tflm子目录:</p>

<p></p>

<p>示例代码是从官方的persion_detection示例修改而来,添加了摄像头采集图像和LCD显示结果。</p>

<p>由于我手里没有配套的摄像头和显示屏,所以本篇没有以这个示例作为实验。</p>

<p>&nbsp;</p>

<h2>在HPM6750上运行TFLM基准测试</h2>

<p>接下来以person detection benchmark为例,讲解如何在HPM6750上运行TFLM基准测试。</p>

<h3>将person detection benchmark源代码添加到HPM SDK环境</h3>

<p>按照如下步骤,在HPM SDK环境中添加person detection benchmark源代码文件:</p>

<ol>
        <li>在HPM SDK的samples子目录创建tflm_person_detect_benchmark目录,并在其中创建src目录;</li>
        <li>从上文描述的已经运行过person detection benchmark的tflite-micro目录中拷贝如下文件到src目录:
        <ol>
                <li>tensorflow\lite\micro\benchmarks\person_detection_benchmark.cc</li>
                <li>tensorflow\lite\micro\benchmarks\micro_benchmark.h</li>
                <li>tensorflow\lite\micro\examples\person_detection\model_settings.h</li>
                <li>tensorflow\lite\micro\examples\person_detection\model_settings.cc</li>
        </ol>
        </li>
        <li>在src目录创建testdata子目录,并将tflite-micro目录下如下目录中的文件拷贝全部到testdata中:
        <ol>
                <li>tensorflow\lite\micro\tools\make\gen\linux_x86_64_default\genfiles\tensorflow\lite\micro\examples\person_detection\testdata</li>
        </ol>
        </li>
        <li>修改person_detection_benchmark.cc、model_settings.cc、no_person_image_data.cc、person_image_data.cc 文件中部分#include预处理指令的文件路径(根据拷贝后的相对路径修改);</li>
        <li>person_detection_benchmark.cc文件中,main函数的一开始添加一行board_init();、顶部添加一行#include &quot;board.h&rdquo;</li>
</ol>

<h3>添加CMakeLists.txt和app.yaml文件</h3>

<p>在src平级创建CMakeLists.txt文件,内容如下:</p>

<pre style="background:#555; padding:10px; color:#ddd !important;">
<code>cmake_minimum_required(VERSION 3.13)

set(CONFIG_TFLM 1)

find_package(hpm-sdk REQUIRED HINTS $ENV{HPM_SDK_BASE})
project(tflm_person_detect_benchmark)
set(CMAKE_CXX_STANDARD 11)

sdk_app_src(src/model_settings.cc)
sdk_app_src(src/person_detection_benchmark.cc)
sdk_app_src(src/testdata/no_person_image_data.cc)
sdk_app_src(src/testdata/person_image_data.cc)

sdk_app_inc(src)
sdk_ld_options("-lm")
sdk_ld_options("--std=c++11")
sdk_compile_definitions(__HPMICRO__)
sdk_compile_definitions(-DINIT_EXT_RAM_FOR_DATA=1)
# sdk_compile_options("-mabi=ilp32f")
# sdk_compile_options("-march=rv32imafc")
sdk_compile_options("-O2")
# sdk_compile_options("-O3")
set(SEGGER_LEVEL_O3 1)
generate_ses_project()
</code></pre>

<p>在src平级创建app.yaml文件,内容如下:</p>

<pre style="background:#555; padding:10px; color:#ddd !important;">
<code>dependency:
- tflm
</code></pre>

<h3>编译和运行TFLM基准测试</h3>

<p>接下来就是大家熟悉的&mdash;&mdash;编译运行了。</p>

<p>首先,使用generate_project生产项目:</p>

<p></p>

<p>接着,将HPM6750开发板连接到PC,在Embedded Studio中打卡刚刚生产的项目:</p>

<p>这个项目因为引入了TFLM的源码,文件较多,所以右边的源码导航窗里面的Indexing要执行很久才能结束。</p>

<p>然后,就可以使用F7编译、F5调试项目了:</p>

<p></p>

<p>编译完成后,先打卡串口终端连接到设备串口,波特率115200。启动调试后,直接继续运行,就可以在串口终端中看到基准测试的输出了:</p>

<pre style="background:#555; padding:10px; color:#ddd !important;">
<code>==============================
hpm6750evkmini clock summary
==============================
cpu0:            816000000Hz
cpu1:            816000000Hz
axi0:            200000000Hz
axi1:            200000000Hz
axi2:            200000000Hz
ahb:             200000000Hz
mchtmr0:         24000000Hz
mchtmr1:         1000000Hz
xpi0:            133333333Hz
xpi1:            400000000Hz
dram:            166666666Hz
display:         74250000Hz
cam0:            59400000Hz
cam1:            59400000Hz
jpeg:            200000000Hz
pdma:            200000000Hz
==============================

----------------------------------------------------------------------
$$\\   $$\\ $$$$$$$\\$$\\      $$\\ $$\\
$$ |$$ |$$__$$\\ $$$\\    $$$ |\\__|
$$ |$$ |$$ |$$ |$$$$\\$$$$ |$$\\$$$$$$$\\$$$$$$\\   $$$$$$\\
$$$$$$$$ |$$$$$$$|$$\\$$\\$$ $$ |$$ |$$_____|$$__$$\\ $$__$$\\
$$__$$ |$$____/ $$ \\$$$$$ |$$ |$$ /      $$ |\\__|$$ /$$ |
$$ |$$ |$$ |      $$ |\\$/$$ |$$ |$$ |      $$ |      $$ |$$ |
$$ |$$ |$$ |      $$ | \\_/ $$ |$$ |\\$$$$$$$\\ $$ |      \\$$$$$$|
\\__|\\__|\\__|      \\__|   \\__|\\__| \\_______|\\__|       \\______/
----------------------------------------------------------------------
InitializeBenchmarkRunner took 114969 ticks (4 ms).

WithPersonDataIterations(1) took 10694521 ticks (445 ms)
DEPTHWISE_CONV_2D took 275798 ticks (11 ms).
DEPTHWISE_CONV_2D took 280579 ticks (11 ms).
CONV_2D took 516051 ticks (21 ms).
DEPTHWISE_CONV_2D took 139000 ticks (5 ms).
CONV_2D took 459646 ticks (19 ms).
DEPTHWISE_CONV_2D took 274903 ticks (11 ms).
CONV_2D took 868518 ticks (36 ms).
DEPTHWISE_CONV_2D took 68180 ticks (2 ms).
CONV_2D took 434392 ticks (18 ms).
DEPTHWISE_CONV_2D took 132918 ticks (5 ms).
CONV_2D took 843014 ticks (35 ms).
DEPTHWISE_CONV_2D took 33228 ticks (1 ms).
CONV_2D took 423288 ticks (17 ms).
DEPTHWISE_CONV_2D took 62040 ticks (2 ms).
CONV_2D took 833033 ticks (34 ms).
DEPTHWISE_CONV_2D took 62198 ticks (2 ms).
CONV_2D took 834644 ticks (34 ms).
DEPTHWISE_CONV_2D took 62176 ticks (2 ms).
CONV_2D took 838212 ticks (34 ms).
DEPTHWISE_CONV_2D took 62206 ticks (2 ms).
CONV_2D took 832857 ticks (34 ms).
DEPTHWISE_CONV_2D took 62194 ticks (2 ms).
CONV_2D took 832882 ticks (34 ms).
DEPTHWISE_CONV_2D took 16050 ticks (0 ms).
CONV_2D took 438774 ticks (18 ms).
DEPTHWISE_CONV_2D took 27494 ticks (1 ms).
CONV_2D took 974362 ticks (40 ms).
AVERAGE_POOL_2D took 2323 ticks (0 ms).
CONV_2D took 1128 ticks (0 ms).
RESHAPE took 184 ticks (0 ms).
SOFTMAX took 2249 ticks (0 ms).

NoPersonDataIterations(1) took 10694160 ticks (445 ms)
DEPTHWISE_CONV_2D took 274922 ticks (11 ms).
DEPTHWISE_CONV_2D took 281095 ticks (11 ms).
CONV_2D took 515380 ticks (21 ms).
DEPTHWISE_CONV_2D took 139428 ticks (5 ms).
CONV_2D took 460039 ticks (19 ms).
DEPTHWISE_CONV_2D took 275255 ticks (11 ms).
CONV_2D took 868787 ticks (36 ms).
DEPTHWISE_CONV_2D took 68384 ticks (2 ms).
CONV_2D took 434537 ticks (18 ms).
DEPTHWISE_CONV_2D took 133071 ticks (5 ms).
CONV_2D took 843202 ticks (35 ms).
DEPTHWISE_CONV_2D took 33291 ticks (1 ms).
CONV_2D took 423388 ticks (17 ms).
DEPTHWISE_CONV_2D took 62190 ticks (2 ms).
CONV_2D took 832978 ticks (34 ms).
DEPTHWISE_CONV_2D took 62205 ticks (2 ms).
CONV_2D took 834636 ticks (34 ms).
DEPTHWISE_CONV_2D took 62213 ticks (2 ms).
CONV_2D took 838212 ticks (34 ms).
DEPTHWISE_CONV_2D took 62239 ticks (2 ms).
CONV_2D took 832850 ticks (34 ms).
DEPTHWISE_CONV_2D took 62217 ticks (2 ms).
CONV_2D took 832856 ticks (34 ms).
DEPTHWISE_CONV_2D took 16040 ticks (0 ms).
CONV_2D took 438779 ticks (18 ms).
DEPTHWISE_CONV_2D took 27481 ticks (1 ms).
CONV_2D took 974354 ticks (40 ms).
AVERAGE_POOL_2D took 1812 ticks (0 ms).
CONV_2D took 1077 ticks (0 ms).
RESHAPE took 341 ticks (0 ms).
SOFTMAX took 901 ticks (0 ms).

WithPersonDataIterations(10) took 106960312 ticks (4456 ms)

NoPersonDataIterations(10) took 106964554 ticks (4456 ms)
</code></pre>

<p>可以看到,在HPM6750EVKMINI开发板上,连续运行10次人像检测模型,总体耗时4456毫秒,每次平均耗时445.6毫秒。</p>

<p>&nbsp;</p>

<h2>在树莓派3B+上运行TFLM基准测试</h2>

<h3>在树莓派上运行TFLM基准测试</h3>

<p>树莓派3B+上可以和PC上类似,下载源码后,直接运行PC端的make命令:</p>

<p>make -f tensorflow/lite/micro/tools/make/Makefile</p>

<p>一段时间后,即可得到基准测试结果:</p>

<p>可以看到,在树莓派3B+上的,对于有人脸的图片,连续运行10次人脸检测模型,总体耗时4186毫秒,每次平均耗时418.6毫秒;对于无人脸的图片,连续运行10次人脸检测模型,耗时4190毫秒,每次平均耗时419毫秒。</p>

<p>&nbsp;</p>

<h3>HPM6750和AMD R7 4800H、树莓派3B+的基准测试结果对比</h3>

<p>这里将HPM6750EVKMINI开发板、树莓派3B+和AMD R7 4800H上运行人脸检测模型的平均耗时结果汇总如下:</p>

<table>
        <thead>
                <tr>
                        <th>&nbsp;</th>
                        <th>树莓派3B+</th>
                        <th>HPM6750EVKMINI</th>
                        <th>AMD R7 4800H</th>
                </tr>
        </thead>
        <tbody>
                <tr>
                        <td>有人脸平均耗时(ms)</td>
                        <td>418.6</td>
                        <td>445.6</td>
                        <td>32.6</td>
                </tr>
                <tr>
                        <td>无人脸平均耗时(ms)</td>
                        <td>419</td>
                        <td>445.6</td>
                        <td>35.2</td>
                </tr>
                <tr>
                        <td>CPU最高主频(Hz)</td>
                        <td>1.4G</td>
                        <td>816M</td>
                        <td>4.2G</td>
                </tr>
        </tbody>
</table>

<p>可以看到,在TFLM人脸检测模型计算场景下,HPM6750EVKMINI和树莓派3B+成绩相当。虽然HPM6750的816MHz CPU频率比树莓派3B+搭载的BCM2837 Cortex-A53 1.4GHz的主频低,但是在单核心计算能力上平没有相差太多。</p>

<p>这里树莓派3B+上的TFLM基准测试程序是运行在64位Debian Linux发行版上的,而HPM6750上的测试程序是直接运行在裸机上的。由于操作系统内核中任务调度器的存在,会对CPU的计算能力带来一定损耗。所以,这里进行的并不是一个严格意义上的对比测试,测试结果仅供参考。</p>

<p>&nbsp;</p>

<h2>参考连接</h2>

<p>更多内容可以参考TFLM官网和项目源码。</p>

<ol>
        <li>TFLite指南:<a href="https://tensorflow.google.cn/lite/guide?hl=zh-cn">https://tensorflow.google.cn/lite/guide?hl=zh-cn</a></li>
        <li>TFLM介绍:<a href="https://tensorflow.google.cn/lite/microcontrollers/overview?hl=zh-cn">https://tensorflow.google.cn/lite/microcontrollers/overview?hl=zh-cn</a></li>
        <li>TensorFlow官网:<a href="https://tensorflow.google.cn/">https://tensorflow.google.cn/</a></li>
</ol>

Jacktang 发表于 2022-6-26 07:13

<p>跟楼主学习的TFLM,感觉谷歌还是比较牛叉的<br />
并了解了TFLM基准测试,看了楼主的在树莓派上运行TFLM基准测试,虽然在单核心计算能力上平没有相差太多,但多核时代会有差异的</p>
页: [1]
查看完整版本: 【先楫HPM6750测评】运行边缘AI框架——TFLM基准测试