【贝能高性价比ATSAMD51评估板】基准性能测试之一:整数计算能力Dhrystone

qinyunti · 发表于2022-11-30 22:55

【贝能高性价比ATSAMD51评估板】基准性能测试之一:整数计算能力Dhrystone [复制链接]

前言

Dhrystone是于1984年由Reinhold P. Weicker设计的一套综合的基准程序，该程序用来测试CPU（整数）计算性能。其输出结果为每秒钟运行Dhrystone的次数，即每秒钟迭代主循环的次数。我们以该基准测试程序来对CPU整数计算性能进行测试。

过程

添加代码

获取代码

http://www.roylongbottom.org.uk/classic_benchmarks.tar.gz

解压classic_benchmarks.tar.gz将\classic_benchmarks\classic_benchmarks\source_code\dhrystone2文件夹复制到自己的工程。

添加代码

将\classic_benchmarks\source_code\dhrystone2文件夹复制到工程目录，并添加工程中

移植接口

dhry_1.c中

注释掉 //#include "cpuidh.h"

增加

#include "definitions.h"

原来是以下代码获取之间代码执行时间(S),到全局变量User_Time

start_time();

......

end_time();

User_Time = secs

我们使用plib_systick.c来获取时间

我们之前使用配置工具配置的systick的中断周期是1mS

SYSTICK_TimerInitialize

通过SYSTICK_GetTickCounter获取中断次数即mS值。

所以可以改为

uint32_t s_stime_u32 = SYSTICK_GetTickCounter();

......

uint32_t s_etime_u32 = SYSTICK_GetTickCounter();

User_Time = (s_etime_u32 - s_stime_u32)/1000.0;

void main (int argc, char *argv[])

改为

void dhry_main(int argc, char *argv[])

注释掉以下内容

///getDetails();

///for (i=1; i<10; i++)

///{

/// printf("%s\n", configdata);

    ///}

    ///printf("\n");

    ///fprintf (Ap, " #####################################################\n\n");

    ///for (i=1; i<10; i++)

    ///{

    ///    fprintf(Ap, "%s \n", configdata);

    ///}

///fprintf (Ap, "\n");

185行

   #endif                 "Register option      Selected."

改为

   #endif                // "Register option      Selected."

注释掉452行

   ///local_time();

   ///fprintf (Ap, " #####################################################\n\n");

   ///fprintf (Ap, " Dhrystone Benchmark 2.1 %s via C/C++ %s\n", options, timeday);

   ///fprintf (Ap, " VAX MIPS rating:      %12.2lf\n\n",Vax_Mips);

注释掉130的内容

   ///if ((Ap = fopen("Dhry.txt","a+")) == NULL)

   /// {

   ///     printf(" Can not open Dhry.txt\n\n");

   ///     printf(" Press Enter\n\n");

   ///     int g = getchar();

   ///     exit(1);

   // }

113行

int         nopause = 1;

改为

int         nopause = 0;

测试

firmware\src\config\default\stdio\keil_monitor.c

中修改发送\n时替换为发\r\n

int fputc(int c, FILE* stream)

{

  int chenter = '\r';

    uint8_t size = 0;

  if(c == '\n')

    {

do

{

size = SERCOM2_USART_Write((void*)&chenter, 1);

}while (size != 1);

}

    do

    {

        size = SERCOM2_USART_Write((void*)&c, 1);

    }while (size != 1);

    return c;

}

main.c中

申明 void dhry_main (int argc, char *argv[]);

int main ( void )

{

    /* Initialize all modules */

    SYS_Initialize ( NULL );

  SYSTICK_TimerStart();

    PORT_REGS->GROUP[1].PORT_PINCFG[24] = 0x1U;

    PORT_REGS->GROUP[1].PORT_PINCFG[25] = 0x1U;

    PORT_REGS->GROUP[1].PORT_PMUX[12] = 0x33U;

  dhry_main(0,0);

    while ( true )

    {

        /* Maintain state machines of all polled MPLAB Harmony modules. */

        SYS_Tasks ( );

    }

    /* Execution should not come here during normal operation */

    return ( EXIT_FAILURE );

}

下载程序运行测试

可以看到打印信息如下

可知得分为88.33

image-20221130225034-2.png (84.68 KB, 下载次数: 0)

下载附件保存到相册

2022-11-30 22:50 上传

我们改为-Ofast编译

image-20221130225034-3.png (59.21 KB, 下载次数: 0)

下载附件保存到相册

2022-11-30 22:50 上传

我们看到得分有较大提高为130.05

image-20221130225034-4.png (82.74 KB, 下载次数: 0)

下载附件保存到相册

2022-11-30 22:50 上传

我们再使能dcache

startup_keil.c中

135行增加

    ICache_Enable();

DCache_Enable();

发现得分一样,说明取数据不是瓶颈

image-20221130225034-5.png (83.04 KB, 下载次数: 0)

下载附件保存到相册

2022-11-30 22:50 上传

如果不使能icache

    //ICache_Enable();

DCache_Enable();

我们看到得分差很多，所以flash才是性能瓶颈

image-20221130225034-6.png (85.5 KB, 下载次数: 0)

下载附件保存到相册

2022-11-30 22:50 上传

可以看到flash取指较慢是瓶颈，如果要加快运行速度可以在内部ram中运行。

这里不再对比测试，有兴趣的可以试一下。

总结

通过以上测试得到Dhrystone测试最大得分在130左右。

影响性能的主要是从flash中读指令的速度。

一般cpu运行比flash访问要快，所以读flash时一般要插入等待周期，cpu主频越大插入的等待周期越多，比如stm32的芯片有一个对应表，如果等待周期少于指定频率最低的要求就可能导致读指令错误。而要提高性能，则必须要尽可能在允许范围内设置等待周期为最少值，但是出于稳定行考虑，比如满足不同的温湿度工况等，最好是设置等待周期为最大值，这样冗余度，可靠性更好。

本芯片的flash等待周期的设置没有看到，有可能是自动插入的，没有仔细去看手册了，有兴趣的可以去找一找。