前言
之前进行了开发环境等的体验,现在对各方面的性能进行一个定性体验。
跑分
打开WSL终端
下载代码
git clone
cd coremark/
vi simple/core_portme.h
修改
#define COMPILER_FLAGS \
FLAGS_STR /* "Please put compiler flags here (e.g. -o3)" */
#endif
为
#define COMPILER_FLAGS \
"-O3" /* "Please put compiler flags here (e.g. -o3)" */
#endif
如果-O0编译则改为”-O0”
typedef ee_u32 ee_ptr_int;
改为
typedef unsigned long ee_ptr_int;
编译
export PATH=$PATH:~/lichee/out/sun8iw11p1/linux/common/buildroot/host/usr/bin
arm-linux-gnueabihf-gcc -o coremarko0 core_list_join.c core_main.c core_matrix.c core_state.c core_util.c simple/core_portme.c -DPERFORMANCE_RUN=1 -DITERATIONS=100000 -Isimple -I. -O0
arm-linux-gnueabihf-gcc -o coremarko3 core_list_join.c core_main.c core_matrix.c core_state.c core_util.c simple/core_portme.c -DPERFORMANCE_RUN=1 -DITERATIONS=100000 -Isimple -I. -O3
导入到windows下
cp coremarko0 coremarko3 /mnt/d
然后通过串口rz导入到开发板
添加可执行权限
chmod +x coremarko0 coremarko3
运行
./coremarko0
./coremarko3
结果如下,可以看到优化不同差距较大
root@T3/A40i-Tronlong:~# ./coremarko0
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 146952831
Total time (secs): 146.952831
Iterations/Sec : 680.490463
Iterations : 100000
Compiler version : GCC9.4.0
Compiler flags : -O0
Memory location : STACK
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0xd340
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 680.490463 / GCC9.4.0 -O0 / STACK
root@T3/A40i-Tronlong:~# ./coremarko3
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 29362505
Total time (secs): 29.362505
Iterations/Sec : 3405.703975
Iterations : 100000
Compiler version : GCC9.4.0
Compiler flags : -O0
Memory location : STACK
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0xd340
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 3405.703975 / GCC9.4.0 -O0 / STACK
从https://www.eembc.org/coremark/scores.php
搜索Cortex-A7可以对比同型号CPU的得分。
Cortex - A7 1.2GHz
RAM性能测试
WSL中
下载代码
git clone https://github.com/qinyunti/STREAM.git
cd STREAM/
编译
export PATH=$PATH:~/lichee/out/sun8iw11p1/linux/common/buildroot/host/usr/bin
arm-linux-gnueabihf-gcc -O3 -DSTREAM_ARRAY_SIZE=5000000 stream.c -o stream.5M
导出到windows下
cp stream.5M /mnt/d
然后通过串口rz导入到开发板
添加可执行权限
chmod +x stream.5M
运行
./stream.5M
结果如下
root@T3/A40i-Tronlong:~# ./stream.5M
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 5000000 (elements), Offset = 0 (elements)
Memory per array = 38.1 MiB (= 0.0 GiB).
Total memory required = 114.4 MiB (= 0.1 GiB).
Each kernel will be executed 10 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 52219 microseconds.
(= 52219 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 972.1 0.083436 0.082297 0.084256
Scale: 868.5 0.092398 0.092110 0.092609
Add: 829.7 0.144716 0.144639 0.144788
Triad: 683.4 0.175755 0.175587 0.175917
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
参考https://www.cs.virginia.edu/stream/ref.html
RAM压力测试
参考 https://pyropus.ca./software/memtester/
WSL中
下载代码
wget https://pyropus.ca./software/memtester/old-versions/memtester-4.5.1.tar.gz
tar -xvf memtester-4.5.1.tar.gz
cd memtester-4.5.1/
编译
export PATH=$PATH:~/lichee/out/sun8iw11p1/linux/common/buildroot/host/usr/bin
arm-linux-gnueabihf-gcc -O3 memtester.c tests.c -o memtester
导出到WINDOWS下,下载到开发板
cp memtester /mnt/d
chmod +x memtester
运行
./memtester
运行结果如下,默认一直测试下去,可以最后指定测试次数
比如
./memtester 128M 1
128M表示测试RAM大小
1表示测试一次
另外也可以-p直接指定物理地址,适合在板子开发阶段裸机代码直接指定物理地址测试。
root@T3/A40i-Tronlong:~# ./memtester 128M 1
memtester version 4.5.1 (32-bit)
Copyright (C) 2001-2020 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).
pagesize is 4096
pagesizemask is 0xfffff000
want 128MB (134217728 bytes)
got 128MB (134217728 bytes), trying mlock ...locked.
Loop 1/1:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
Done.
EMMC性能测试
dmesg | grep mmc
4GEMMC
[ 4.008550] mmc0: new HS200 MMC card at address 0001
[ 4.009409] mmcblk0: mmc0:0001 S04111 3.56 GiB
和16G的SD卡
[ 8.202017] mmc1: new high speed SDHC card at address aaaa
[ 8.208872] mmcblk1: mmc1:aaaa SL16G 14.8 GiB
EMMC速度为HS200
Speed Mode
|
clock (MHz)
|
Default Speed
|
26
|
Hight Speed SDR
|
52
|
Hight Speed DDR
|
52
|
HS200
|
200
|
HS400
|
200
|
df查看,使用/目录进行读写测试
root@T3/A40i-Tronlong:~# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 2029971 514680 1406338 27% /
devtmpfs 107996 0 107996 0% /dev
tmpfs 124604 0 124604 0% /dev/shm
tmpfs 124604 8 124596 0% /tmp
tmpfs 124604 12 124592 0% /run
cgroup 124604 0 124604 0% /sys/fs/cgroup
root@T3/A40i-Tronlong:~#
不插入SD卡 /挂载在emmc
|
bs/count 1GB
|
指令
|
结果
|
读
|
16k/65536
|
time dd if=test.bin of=/dev/null bs=16k count=65536
|
98.5MB/S
|
4k/262144
|
|
|
1k/1048576
|
|
|
写
|
16k/65536
|
time dd if=/dev/zero of=/test.bin bs=16k count=65536
|
27.24MB/S
|
4k/262144
|
|
|
1k/1048576
|
|
|
root@T3/A40i-Tronlong:/# time dd if=/dev/zero of=/test.bin bs=16k count=65536
65536+0 records in
65536+0 records out
real 0m37.581s
user 0m0.080s
sys 0m15.230s
root@T3/A40i-Tronlong:/# time dd if=test.bin of=/dev/null bs=16k count=65536
65536+0 records in
65536+0 records out
real 0m10.386s
user 0m0.070s
sys 0m4.040s
root@T3/A40i-Tronlong:/#
以上仅作参考,实际欸有考虑缓存的影响。
SD卡性能测试
插入SD卡后重启,自动挂在到/root到SD卡
|
bs/count 1GB
|
指令
|
结果
|
读
|
16k/65536
|
time dd if=/root/test.bin of=/dev/null bs=16k count=65536
|
21.25MB/S
|
4k/262144
|
|
|
1k/1048576
|
|
|
写
|
16k/65536
|
time dd if=/dev/zero of=/root/test.bin bs=16k count=65536
|
11MB/S
|
4k/262144
|
|
|
1k/1048576
|
|
|
root@T3/A40i-Tronlong:~# time dd if=/dev/zero of=/root/test.bin bs=16k count=65536
65536+0 records in
65536+0 records out
real 1m32.412s
user 0m0.330s
sys 0m17.700s
root@T3/A40i-Tronlong:~# time dd if=/root/test.bin of=/dev/null bs=16k count=65536
65536+0 records in
65536+0 records out
real 0m48.177s
user 0m0.100s
sys 0m4.350s
速度和SD卡本身有关,也没有考虑缓存,所以结果仅作参考。
总结
以上综合对性能进行了测试,感觉性能还是非常不错的,各测试结果仅作参考,因为环境等因素不一样测得结果也会不一样,包括存储的测试方法也不是很科学,比如没有考虑缓存等。上述测试只是一个定性的性能体验,板子的性能是一个综合的体验,需要是面对真是的应用场景才有意义,并且针对场景优化也很重要。
|