【先楫HPM6750测评】运行边缘AI框架——TFLM基准测试
[复制链接]
本帖最后由 xusiwei1236 于 2022-6-25 23:40 编辑
本篇将会介绍TFLM是什么,然后介绍TFLM官方的基准测试,以及如何在HPM6750上运行TFLM基准测试,并和树莓派3B+上的基准测试结果进行对比。
TFLM是什么?
你或许都听说过TensorFlow——由谷歌开发并开源的一个机器学习库,它支持模型训练和模型推理。
今天介绍的TFLM,全称是TensorFlow Lite for Microcontrollers,翻译过来就是“针对微控制器的TensorFlow Lite”。那TensorFlow Lite又是什么呢?
TensorFlow Lite(通常简称TFLite)其实是TensorFlow团队为了将模型部署到移动设备而开发的一套解决方案,通俗的说就是手机版的TensorFlow。下面是TensorFlow官网上关于TFLite的一段介绍:
TensorFlow Lite 是一组工具,可帮助开发者在移动设备、嵌入式设备和 loT 设备上运行模型,以便实现设备端机器学习。
而我们今天要介绍的TensorFlow Lite for Microcontrollers(TFLM)则是 TensorFlow Lite的微控制器版本。这里是官网上的一段介绍:
TensorFlow Lite for Microcontrollers (以下简称TFLM)是 TensorFlow Lite 的一个实验性移植版本,它适用于微控制器和其他一些仅有数千字节内存的设备。 它可以直接在“裸机”上运行,不需要操作系统支持、任何标准 C/C++ 库和动态内存分配。核心运行时(core runtime)在 Cortex M3 上运行时仅需 16KB,加上足以用来运行语音关键字检测模型的操作,也只需 22KB 的空间。
这三者一脉相承,都出自谷歌,区别是TensorFlow同时支持训练和推理,而后两者只支持推理。TFLite主要用于支持手机、平板等移动设备,TFLM则可以支持单片机。从发展历程上来说,后两者都是TensorFlow项目的“支线项目”。或者说这三者是一个树形的发展过程,具体来说,TFLite是从TensorFlow项目分裂出来的,TFLite-Micro是从TFLite分裂出来的,目前是三个并行发展的。在很长一段时间内,这三个项目的源码都在一个代码仓中维护,从源码目录的包含关系上来说,TensorFlow包含后两者,TFLite包含tflite-micro。
TFLM开源项目
2021年6月,谷歌将TFLM项目的源代码从TensorFlow主仓中转移到了一个独立的代码仓中。
但截至目前(2022年6月),TFLite的源代码仍然以TensorFlow项目中的一个子目录进行维护。这也可以看出谷歌对TFLM的重视。
TFLM代码仓链接:
下载命令: git clone
TFLM主要业务代码位于tensorflow\lite\micro子目录:
TFLM官方支持make和bazel构建。
TFLM基准测试
TFLM代码仓顶层的README.md中给出了基准测试文档链接:
该文档篇幅不长:
通过这个目录我们可以知道,TFLM提供了两个基准测试(实际有三个),分别是:
- 关键词基准测试
- 关键词基准测试使用的是程序运行时生产的随机数据作为输入,所以它的输出是没有意义的
- 人体检测基准测试
- 人体检测基准测试使用了两张bmp图片作为输入
- 具体位于tensorflow\lite\micro\examples\person_detection\testdata子目录
下载依赖的软件
在PC的Linux系统上,运行TFLM基准测试之前,需要先安装依赖的一些工具:
sudo apt install git unzip wget python3 python3-pip
基准测试命令
参考”Run on x86”,在x86 PC上运行关键词基准测试的命令是:
make -f tensorflow/lite/micro/tools/make/Makefile run_keyword_benchmark
在PC上运行人体检测基准测试的命令是:
make -f tensorflow/lite/micro/tools/make/Makefile run_person_detection_benchmark
执行这两个命令,会依次执行如下步骤:
- 调用几个下载脚本,下载依赖库和数据集;
- 编译测试程序;
- 运行测试程序;
tensorflow/lite/micro/tools/make/Makefile 代码片段中,可以看到调用了几个下载脚本:
flatbuffers_download.sh和kissfft_download.sh脚本第一次执行时,会将相应的压缩包下载到本地,并解压,具体细节参见代码内容;
pigweed_download.sh脚本会克隆一个代码仓,再检出一个特定版本:
这里需要注意的是,代码仓https://pigweed.googlesource.com/pigweed/pigweed 国内一般无法访问(因为域名googlesource.com被禁了)。将此连接修改为我克隆好的代码仓: 可以解决因为国内无法访问googlesource.com而无法下载pigweed测试数据的问题。
基准测试的构建规则
tensorflow/lite/micro/tools/make/Makefile 文件是Makefile总入口文件,该文件中定义了一些makefile宏函数,并通过include引入了其他文件,包括定义了两个基准测试编译规则的tensorflow/lite/micro/benchmarks/Makefile.inc 文件:
KEYWORD_BENCHMARK_SRCS := \\
tensorflow/lite/micro/benchmarks/keyword_benchmark.cc
KEYWORD_BENCHMARK_GENERATOR_INPUTS := \\
tensorflow/lite/micro/models/keyword_scrambled.tflite
KEYWORD_BENCHMARK_HDRS := \\
tensorflow/lite/micro/benchmarks/micro_benchmark.h
KEYWORD_BENCHMARK_8BIT_SRCS := \\
tensorflow/lite/micro/benchmarks/keyword_benchmark_8bit.cc
KEYWORD_BENCHMARK_8BIT_GENERATOR_INPUTS := \\
tensorflow/lite/micro/models/keyword_scrambled_8bit.tflite
KEYWORD_BENCHMARK_8BIT_HDRS := \\
tensorflow/lite/micro/benchmarks/micro_benchmark.h
PERSON_DETECTION_BENCHMARK_SRCS := \\
tensorflow/lite/micro/benchmarks/person_detection_benchmark.cc
PERSON_DETECTION_BENCHMARK_GENERATOR_INPUTS := \\
tensorflow/lite/micro/examples/person_detection/testdata/person.bmp \\
tensorflow/lite/micro/examples/person_detection/testdata/no_person.bmp
ifneq ($(CO_PROCESSOR),ethos_u)
PERSON_DETECTION_BENCHMARK_GENERATOR_INPUTS += \\
tensorflow/lite/micro/models/person_detect.tflite
else
# Ethos-U use a Vela optimized version of the original model.
PERSON_DETECTION_BENCHMARK_SRCS += \\
$(GENERATED_SRCS_DIR)tensorflow/lite/micro/models/person_detect_model_data_vela.cc
endif
PERSON_DETECTION_BENCHMARK_HDRS := \\
tensorflow/lite/micro/examples/person_detection/model_settings.h \\
tensorflow/lite/micro/benchmarks/micro_benchmark.h
# Builds a standalone binary.
$(eval $(call microlite_test,keyword_benchmark,\\
$(KEYWORD_BENCHMARK_SRCS),$(KEYWORD_BENCHMARK_HDRS),$(KEYWORD_BENCHMARK_GENERATOR_INPUTS)))
# Builds a standalone binary.
$(eval $(call microlite_test,keyword_benchmark_8bit,\\
$(KEYWORD_BENCHMARK_8BIT_SRCS),$(KEYWORD_BENCHMARK_8BIT_HDRS),$(KEYWORD_BENCHMARK_8BIT_GENERATOR_INPUTS)))
$(eval $(call microlite_test,person_detection_benchmark,\\
$(PERSON_DETECTION_BENCHMARK_SRCS),$(PERSON_DETECTION_BENCHMARK_HDRS),$(PERSON_DETECTION_BENCHMARK_GENERATOR_INPUTS)))
从这里可以看到,实际上有三个基准测试程序,比文档多了一个 keyword_benchmark_8bit ,应该是 keword_benchmark的8bit量化版本。另外,可以看到有三个tflite的模型文件。
Keyword基准测试
关键词基准测试使用的模型较小,比较适合在STM32 F3/F4这类主频低于100MHz的MCU。
这个基准测试的模型比较小,计算量也不大,所以在PC上运行这个基准测试的耗时非常短:
可以看到,在PC上运行关键词唤醒的速度非常快,10次时间不到1毫秒。
模型文件路径为:./tensorflow/lite/micro/models/keyword_scrambled.tflite
模型结构可以使用Netron软件查看。
Person detection基准测试
人体检测基准测试的计算量相对要大一些,运行的时间也要长一些:
xu@VirtualBox:~/opensource/tflite-micro$ make -f tensorflow/lite/micro/tools/make/Makefile run_person_detection_benchmark
tensorflow/lite/micro/tools/make/downloads/flatbuffers already exists, skipping the download.
tensorflow/lite/micro/tools/make/downloads/kissfft already exists, skipping the download.
tensorflow/lite/micro/tools/make/downloads/pigweed already exists, skipping the download.
g++ -std=c++11 -fno-rtti -fno-exceptions -fno-threadsafe-statics -Werror -fno-unwind-tables -ffunction-sections -fdata-sections -fmessage-length=0 -DTF_LITE_STATIC_MEMORY -DTF_LITE_DISABLE_X86_NEON -Wsign-compare -Wdouble-promotion -Wshadow -Wunused-variable -Wunused-function -Wswitch -Wvla -Wall -Wextra -Wmissing-field-initializers -Wstrict-aliasing -Wno-unused-parameter -DTF_LITE_USE_CTIME -Os -I. -Itensorflow/lite/micro/tools/make/downloads/gemmlowp -Itensorflow/lite/micro/tools/make/downloads/flatbuffers/include -Itensorflow/lite/micro/tools/make/downloads/ruy -Itensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/ -Itensorflow/lite/micro/tools/make/downloads/kissfft -c tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/examples/person_detection/testdata/person_image_data.cc -o tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/obj/core/tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/examples/person_detection/testdata/person_image_data.o
g++ -std=c++11 -fno-rtti -fno-exceptions -fno-threadsafe-statics -Werror -fno-unwind-tables -ffunction-sections -fdata-sections -fmessage-length=0 -DTF_LITE_STATIC_MEMORY -DTF_LITE_DISABLE_X86_NEON -Wsign-compare -Wdouble-promotion -Wshadow -Wunused-variable -Wunused-function -Wswitch -Wvla -Wall -Wextra -Wmissing-field-initializers -Wstrict-aliasing -Wno-unused-parameter -DTF_LITE_USE_CTIME -Os -I. -Itensorflow/lite/micro/tools/make/downloads/gemmlowp -Itensorflow/lite/micro/tools/make/downloads/flatbuffers/include -Itensorflow/lite/micro/tools/make/downloads/ruy -Itensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/ -Itensorflow/lite/micro/tools/make/downloads/kissfft -c tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/examples/person_detection/testdata/no_person_image_data.cc -o tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/obj/core/tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/examples/person_detection/testdata/no_person_image_data.o
g++ -std=c++11 -fno-rtti -fno-exceptions -fno-threadsafe-statics -Werror -fno-unwind-tables -ffunction-sections -fdata-sections -fmessage-length=0 -DTF_LITE_STATIC_MEMORY -DTF_LITE_DISABLE_X86_NEON -Wsign-compare -Wdouble-promotion -Wshadow -Wunused-variable -Wunused-function -Wswitch -Wvla -Wall -Wextra -Wmissing-field-initializers -Wstrict-aliasing -Wno-unused-parameter -DTF_LITE_USE_CTIME -Os -I. -Itensorflow/lite/micro/tools/make/downloads/gemmlowp -Itensorflow/lite/micro/tools/make/downloads/flatbuffers/include -Itensorflow/lite/micro/tools/make/downloads/ruy -Itensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/ -Itensorflow/lite/micro/tools/make/downloads/kissfft -c tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/models/person_detect_model_data.cc -o tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/obj/core/tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/models/person_detect_model_data.o
g++ -std=c++11 -fno-rtti -fno-exceptions -fno-threadsafe-statics -Werror -fno-unwind-tables -ffunction-sections -fdata-sections -fmessage-length=0 -DTF_LITE_STATIC_MEMORY -DTF_LITE_DISABLE_X86_NEON -Wsign-compare -Wdouble-promotion -Wshadow -Wunused-variable -Wunused-function -Wswitch -Wvla -Wall -Wextra -Wmissing-field-initializers -Wstrict-aliasing -Wno-unused-parameter -DTF_LITE_USE_CTIME -I. -Itensorflow/lite/micro/tools/make/downloads/gemmlowp -Itensorflow/lite/micro/tools/make/downloads/flatbuffers/include -Itensorflow/lite/micro/tools/make/downloads/ruy -Itensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/ -Itensorflow/lite/micro/tools/make/downloads/kissfft -o tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/bin/person_detection_benchmark tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/obj/core/tensorflow/lite/micro/benchmarks/person_detection_benchmark.o tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/obj/core/tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/examples/person_detection/testdata/person_image_data.o tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/obj/core/tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/examples/person_detection/testdata/no_person_image_data.o tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/obj/core/tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/genfiles/tensorflow/lite/micro/models/person_detect_model_data.o tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/lib/libtensorflow-microlite.a -Wl,--fatal-warnings -Wl,--gc-sections -lm
tensorflow/lite/micro/tools/make/gen/linux_x86_64_default/bin/person_detection_benchmark non_test_binary linux
InitializeBenchmarkRunner took 192 ticks (0 ms).
WithPersonDataIterations(1) took 32299 ticks (32 ms)
DEPTHWISE_CONV_2D took 895 ticks (0 ms).
DEPTHWISE_CONV_2D took 895 ticks (0 ms).
CONV_2D took 1801 ticks (1 ms).
DEPTHWISE_CONV_2D took 424 ticks (0 ms).
CONV_2D took 1465 ticks (1 ms).
DEPTHWISE_CONV_2D took 921 ticks (0 ms).
CONV_2D took 2725 ticks (2 ms).
DEPTHWISE_CONV_2D took 206 ticks (0 ms).
CONV_2D took 1367 ticks (1 ms).
DEPTHWISE_CONV_2D took 423 ticks (0 ms).
CONV_2D took 2540 ticks (2 ms).
DEPTHWISE_CONV_2D took 102 ticks (0 ms).
CONV_2D took 1265 ticks (1 ms).
DEPTHWISE_CONV_2D took 205 ticks (0 ms).
CONV_2D took 2449 ticks (2 ms).
DEPTHWISE_CONV_2D took 204 ticks (0 ms).
CONV_2D took 2449 ticks (2 ms).
DEPTHWISE_CONV_2D took 243 ticks (0 ms).
CONV_2D took 2483 ticks (2 ms).
DEPTHWISE_CONV_2D took 202 ticks (0 ms).
CONV_2D took 2481 ticks (2 ms).
DEPTHWISE_CONV_2D took 203 ticks (0 ms).
CONV_2D took 2489 ticks (2 ms).
DEPTHWISE_CONV_2D took 52 ticks (0 ms).
CONV_2D took 1222 ticks (1 ms).
DEPTHWISE_CONV_2D took 90 ticks (0 ms).
CONV_2D took 2485 ticks (2 ms).
AVERAGE_POOL_2D took 8 ticks (0 ms).
CONV_2D took 3 ticks (0 ms).
RESHAPE took 0 ticks (0 ms).
SOFTMAX took 2 ticks (0 ms).
NoPersonDataIterations(1) took 32148 ticks (32 ms)
DEPTHWISE_CONV_2D took 906 ticks (0 ms).
DEPTHWISE_CONV_2D took 924 ticks (0 ms).
CONV_2D took 1762 ticks (1 ms).
DEPTHWISE_CONV_2D took 446 ticks (0 ms).
CONV_2D took 1466 ticks (1 ms).
DEPTHWISE_CONV_2D took 897 ticks (0 ms).
CONV_2D took 2692 ticks (2 ms).
DEPTHWISE_CONV_2D took 209 ticks (0 ms).
CONV_2D took 1366 ticks (1 ms).
DEPTHWISE_CONV_2D took 427 ticks (0 ms).
CONV_2D took 2548 ticks (2 ms).
DEPTHWISE_CONV_2D took 102 ticks (0 ms).
CONV_2D took 1258 ticks (1 ms).
DEPTHWISE_CONV_2D took 208 ticks (0 ms).
CONV_2D took 2473 ticks (2 ms).
DEPTHWISE_CONV_2D took 210 ticks (0 ms).
CONV_2D took 2460 ticks (2 ms).
DEPTHWISE_CONV_2D took 203 ticks (0 ms).
CONV_2D took 2461 ticks (2 ms).
DEPTHWISE_CONV_2D took 230 ticks (0 ms).
CONV_2D took 2443 ticks (2 ms).
DEPTHWISE_CONV_2D took 203 ticks (0 ms).
CONV_2D took 2467 ticks (2 ms).
DEPTHWISE_CONV_2D took 51 ticks (0 ms).
CONV_2D took 1224 ticks (1 ms).
DEPTHWISE_CONV_2D took 89 ticks (0 ms).
CONV_2D took 2412 ticks (2 ms).
AVERAGE_POOL_2D took 7 ticks (0 ms).
CONV_2D took 2 ticks (0 ms).
RESHAPE took 0 ticks (0 ms).
SOFTMAX took 2 ticks (0 ms).
WithPersonDataIterations(10) took 326947 ticks (326 ms)
NoPersonDataIterations(10) took 352888 ticks (352 ms)
可以看到,人像检测模型运行10次的时间是三百多毫秒,一次平均三十几毫秒。这是在配备AMD标压R7 4800 CPU的Win10虚拟机下运行的结果。
模型文件路径为:./tensorflow/lite/micro/models/person_detect.tflite
同样,可以使用Netron查看模型结构。
HPM SDK中的TFLM
TFLM中间件
HPM SDK中集成了TFLM中间件(类似库,但是没有单独编译为库),位于hpm_sdk\middleware子目录:
这个子目录的代码是由TFLM开源项目裁剪而来,删除了很多不需要的文件。
TFLM 示例
HPM SDK中也提供了TFLM示例,位于hpm_sdk\samples\tflm子目录:
示例代码是从官方的persion_detection示例修改而来,添加了摄像头采集图像和LCD显示结果。
由于我手里没有配套的摄像头和显示屏,所以本篇没有以这个示例作为实验。
在HPM6750上运行TFLM基准测试
接下来以person detection benchmark为例,讲解如何在HPM6750上运行TFLM基准测试。
将person detection benchmark源代码添加到HPM SDK环境
按照如下步骤,在HPM SDK环境中添加person detection benchmark源代码文件:
- 在HPM SDK的samples子目录创建tflm_person_detect_benchmark目录,并在其中创建src目录;
- 从上文描述的已经运行过person detection benchmark的tflite-micro目录中拷贝如下文件到src目录:
- tensorflow\lite\micro\benchmarks\person_detection_benchmark.cc
- tensorflow\lite\micro\benchmarks\micro_benchmark.h
- tensorflow\lite\micro\examples\person_detection\model_settings.h
- tensorflow\lite\micro\examples\person_detection\model_settings.cc
- 在src目录创建testdata子目录,并将tflite-micro目录下如下目录中的文件拷贝全部到testdata中:
- tensorflow\lite\micro\tools\make\gen\linux_x86_64_default\genfiles\tensorflow\lite\micro\examples\person_detection\testdata
- 修改person_detection_benchmark.cc、model_settings.cc、no_person_image_data.cc、person_image_data.cc 文件中部分#include预处理指令的文件路径(根据拷贝后的相对路径修改);
- person_detection_benchmark.cc文件中,main函数的一开始添加一行board_init();、顶部添加一行#include "board.h”
添加CMakeLists.txt和app.yaml文件
在src平级创建CMakeLists.txt文件,内容如下:
cmake_minimum_required(VERSION 3.13)
set(CONFIG_TFLM 1)
find_package(hpm-sdk REQUIRED HINTS $ENV{HPM_SDK_BASE})
project(tflm_person_detect_benchmark)
set(CMAKE_CXX_STANDARD 11)
sdk_app_src(src/model_settings.cc)
sdk_app_src(src/person_detection_benchmark.cc)
sdk_app_src(src/testdata/no_person_image_data.cc)
sdk_app_src(src/testdata/person_image_data.cc)
sdk_app_inc(src)
sdk_ld_options("-lm")
sdk_ld_options("--std=c++11")
sdk_compile_definitions(__HPMICRO__)
sdk_compile_definitions(-DINIT_EXT_RAM_FOR_DATA=1)
# sdk_compile_options("-mabi=ilp32f")
# sdk_compile_options("-march=rv32imafc")
sdk_compile_options("-O2")
# sdk_compile_options("-O3")
set(SEGGER_LEVEL_O3 1)
generate_ses_project()
在src平级创建app.yaml文件,内容如下:
dependency:
- tflm
编译和运行TFLM基准测试
接下来就是大家熟悉的——编译运行了。
首先,使用generate_project生产项目:
接着,将HPM6750开发板连接到PC,在Embedded Studio中打卡刚刚生产的项目:
这个项目因为引入了TFLM的源码,文件较多,所以右边的源码导航窗里面的Indexing要执行很久才能结束。
然后,就可以使用F7编译、F5调试项目了:
编译完成后,先打卡串口终端连接到设备串口,波特率115200。启动调试后,直接继续运行,就可以在串口终端中看到基准测试的输出了:
==============================
hpm6750evkmini clock summary
==============================
cpu0: 816000000Hz
cpu1: 816000000Hz
axi0: 200000000Hz
axi1: 200000000Hz
axi2: 200000000Hz
ahb: 200000000Hz
mchtmr0: 24000000Hz
mchtmr1: 1000000Hz
xpi0: 133333333Hz
xpi1: 400000000Hz
dram: 166666666Hz
display: 74250000Hz
cam0: 59400000Hz
cam1: 59400000Hz
jpeg: 200000000Hz
pdma: 200000000Hz
==============================
----------------------------------------------------------------------
$$\\ $$\\ $$$$$$$\\ $$\\ $$\\ $$\\
$$ | $$ |$$ __$$\\ $$$\\ $$$ |\\__|
$$ | $$ |$$ | $$ |$$$$\\ $$$$ |$$\\ $$$$$$$\\ $$$$$$\\ $$$$$$\\
$$$$$$$$ |$$$$$$$ |$$\\$$\\$$ $$ |$$ |$$ _____|$$ __$$\\ $$ __$$\\
$$ __$$ |$$ ____/ $$ \\$$$ $$ |$$ |$$ / $$ | \\__|$$ / $$ |
$$ | $$ |$$ | $$ |\\$ /$$ |$$ |$$ | $$ | $$ | $$ |
$$ | $$ |$$ | $$ | \\_/ $$ |$$ |\\$$$$$$$\\ $$ | \\$$$$$$ |
\\__| \\__|\\__| \\__| \\__|\\__| \\_______|\\__| \\______/
----------------------------------------------------------------------
InitializeBenchmarkRunner took 114969 ticks (4 ms).
WithPersonDataIterations(1) took 10694521 ticks (445 ms)
DEPTHWISE_CONV_2D took 275798 ticks (11 ms).
DEPTHWISE_CONV_2D took 280579 ticks (11 ms).
CONV_2D took 516051 ticks (21 ms).
DEPTHWISE_CONV_2D took 139000 ticks (5 ms).
CONV_2D took 459646 ticks (19 ms).
DEPTHWISE_CONV_2D took 274903 ticks (11 ms).
CONV_2D took 868518 ticks (36 ms).
DEPTHWISE_CONV_2D took 68180 ticks (2 ms).
CONV_2D took 434392 ticks (18 ms).
DEPTHWISE_CONV_2D took 132918 ticks (5 ms).
CONV_2D took 843014 ticks (35 ms).
DEPTHWISE_CONV_2D took 33228 ticks (1 ms).
CONV_2D took 423288 ticks (17 ms).
DEPTHWISE_CONV_2D took 62040 ticks (2 ms).
CONV_2D took 833033 ticks (34 ms).
DEPTHWISE_CONV_2D took 62198 ticks (2 ms).
CONV_2D took 834644 ticks (34 ms).
DEPTHWISE_CONV_2D took 62176 ticks (2 ms).
CONV_2D took 838212 ticks (34 ms).
DEPTHWISE_CONV_2D took 62206 ticks (2 ms).
CONV_2D took 832857 ticks (34 ms).
DEPTHWISE_CONV_2D took 62194 ticks (2 ms).
CONV_2D took 832882 ticks (34 ms).
DEPTHWISE_CONV_2D took 16050 ticks (0 ms).
CONV_2D took 438774 ticks (18 ms).
DEPTHWISE_CONV_2D took 27494 ticks (1 ms).
CONV_2D took 974362 ticks (40 ms).
AVERAGE_POOL_2D took 2323 ticks (0 ms).
CONV_2D took 1128 ticks (0 ms).
RESHAPE took 184 ticks (0 ms).
SOFTMAX took 2249 ticks (0 ms).
NoPersonDataIterations(1) took 10694160 ticks (445 ms)
DEPTHWISE_CONV_2D took 274922 ticks (11 ms).
DEPTHWISE_CONV_2D took 281095 ticks (11 ms).
CONV_2D took 515380 ticks (21 ms).
DEPTHWISE_CONV_2D took 139428 ticks (5 ms).
CONV_2D took 460039 ticks (19 ms).
DEPTHWISE_CONV_2D took 275255 ticks (11 ms).
CONV_2D took 868787 ticks (36 ms).
DEPTHWISE_CONV_2D took 68384 ticks (2 ms).
CONV_2D took 434537 ticks (18 ms).
DEPTHWISE_CONV_2D took 133071 ticks (5 ms).
CONV_2D took 843202 ticks (35 ms).
DEPTHWISE_CONV_2D took 33291 ticks (1 ms).
CONV_2D took 423388 ticks (17 ms).
DEPTHWISE_CONV_2D took 62190 ticks (2 ms).
CONV_2D took 832978 ticks (34 ms).
DEPTHWISE_CONV_2D took 62205 ticks (2 ms).
CONV_2D took 834636 ticks (34 ms).
DEPTHWISE_CONV_2D took 62213 ticks (2 ms).
CONV_2D took 838212 ticks (34 ms).
DEPTHWISE_CONV_2D took 62239 ticks (2 ms).
CONV_2D took 832850 ticks (34 ms).
DEPTHWISE_CONV_2D took 62217 ticks (2 ms).
CONV_2D took 832856 ticks (34 ms).
DEPTHWISE_CONV_2D took 16040 ticks (0 ms).
CONV_2D took 438779 ticks (18 ms).
DEPTHWISE_CONV_2D took 27481 ticks (1 ms).
CONV_2D took 974354 ticks (40 ms).
AVERAGE_POOL_2D took 1812 ticks (0 ms).
CONV_2D took 1077 ticks (0 ms).
RESHAPE took 341 ticks (0 ms).
SOFTMAX took 901 ticks (0 ms).
WithPersonDataIterations(10) took 106960312 ticks (4456 ms)
NoPersonDataIterations(10) took 106964554 ticks (4456 ms)
可以看到,在HPM6750EVKMINI开发板上,连续运行10次人像检测模型,总体耗时4456毫秒,每次平均耗时445.6毫秒。
在树莓派3B+上运行TFLM基准测试
在树莓派上运行TFLM基准测试
树莓派3B+上可以和PC上类似,下载源码后,直接运行PC端的make命令:
make -f tensorflow/lite/micro/tools/make/Makefile
一段时间后,即可得到基准测试结果:
可以看到,在树莓派3B+上的,对于有人脸的图片,连续运行10次人脸检测模型,总体耗时4186毫秒,每次平均耗时418.6毫秒;对于无人脸的图片,连续运行10次人脸检测模型,耗时4190毫秒,每次平均耗时419毫秒。
HPM6750和AMD R7 4800H、树莓派3B+的基准测试结果对比
这里将HPM6750EVKMINI开发板、树莓派3B+和AMD R7 4800H上运行人脸检测模型的平均耗时结果汇总如下:
|
树莓派3B+ |
HPM6750EVKMINI |
AMD R7 4800H |
有人脸平均耗时(ms) |
418.6 |
445.6 |
32.6 |
无人脸平均耗时(ms) |
419 |
445.6 |
35.2 |
CPU最高主频(Hz) |
1.4G |
816M |
4.2G |
可以看到,在TFLM人脸检测模型计算场景下,HPM6750EVKMINI和树莓派3B+成绩相当。虽然HPM6750的816MHz CPU频率比树莓派3B+搭载的BCM2837 Cortex-A53 1.4GHz的主频低,但是在单核心计算能力上平没有相差太多。
这里树莓派3B+上的TFLM基准测试程序是运行在64位Debian Linux发行版上的,而HPM6750上的测试程序是直接运行在裸机上的。由于操作系统内核中任务调度器的存在,会对CPU的计算能力带来一定损耗。所以,这里进行的并不是一个严格意义上的对比测试,测试结果仅供参考。
参考连接
更多内容可以参考TFLM官网和项目源码。
- TFLite指南:https://tensorflow.google.cn/lite/guide?hl=zh-cn
- TFLM介绍:https://tensorflow.google.cn/lite/microcontrollers/overview?hl=zh-cn
- TensorFlow官网:https://tensorflow.google.cn/
|