zzx1997 发表于 2021-3-19 20:53

【米尔边缘AI计算盒FZ5测评】手写数字识别——Letnet神经网络计算加速

<div class='showpostmsg'><p style="text-indent:24.0pt; text-align:justify"><span style="font-size:12pt"><span style="line-height:150%"><span style="font-family:&quot;Times New Roman&quot;,serif"><span style="font-family:宋体">在了解基本神经网络的配置之后将利用</span>HLS<span style="font-family:宋体">平台实现基本的神经网络,网络目前没有进行量化,选择的数据格式为</span>float<span style="font-family:宋体">,由于神经网络的</span>feature map<span style="font-family:宋体">较大,</span>FPGA<span style="font-family:宋体">的片上</span>RAM<span style="font-family:宋体">难以存储整个网络,所以采取逐层加速的方法,降低计算过程中对</span>RAM<span style="font-family:宋体">的需求,但是网络会更容易受到</span>DDR<span style="font-family:宋体">读写带宽的限制。下面对</span>HLS<span style="font-family:宋体">程序进行简单的介绍:</span></span></span></span></p>

<p style="text-indent:24.0pt; text-align:justify"><span style="font-size:12pt"><span style="line-height:150%"><span style="font-family:&quot;Times New Roman&quot;,serif"><span style="font-family:宋体">首先使用</span>AXI_Master<span style="font-family:宋体">总线用于读取位于</span>PS<span style="font-family:宋体">侧的图像和权重数据,然后使用</span>AXI_lite<span style="font-family:宋体">总线用于配置读取的物理地址。</span></span></span></span></p>

<p style="text-indent:24.0pt; text-align:justify"></p>

<p style="text-indent:24.0pt; text-align:justify"><span style="font-size:12pt"><span style="line-height:150%"><span style="font-family:&quot;Times New Roman&quot;,serif"><span style="font-family:宋体">通过</span>AXI<span style="font-family:宋体">总线加载数据的函数如下所示:</span></span></span></span></p>

<p style="text-indent:24.0pt; text-align:justify"></p>

<p style="text-indent:24.0pt; text-align:justify"><span style="font-size:12pt"><span style="line-height:150%"><span style="font-family:&quot;Times New Roman&quot;,serif"><span style="font-family:宋体">根据</span>UG902<span style="font-family:宋体">当中的介绍,在使用</span>AXI<span style="font-family:宋体">总线时使用</span>memcpy<span style="font-family:宋体">函数是可综合成实际电路的。加载完成后将使用逐层计算的方式进行加速:</span></span></span></span></p>

<p align="center" style="text-align:center; text-indent:24.0pt"></p>

<p style="text-indent:24.0pt; text-align:justify"><span style="font-size:12pt"><span style="line-height:150%"><span style="font-family:&quot;Times New Roman&quot;,serif"><span style="font-family:宋体">运算过程中直接使用了</span>for<span style="font-family:宋体">循环,没有根据计算过程设计更优的硬件,</span>for<span style="font-family:宋体">循环中仅使用了流水线命令进行优化,同时使用了数组分割使数组直接映射为触发器,减少运算带宽问题,根据</span>LetNet<span style="font-family:宋体">网络结构完成相关代码编写后,进行综合,</span>HLS<span style="font-family:宋体">综合得到的结果如下所示:</span></span></span></span></p>

<p align="center" style="text-align:center; text-indent:24.0pt"></p>

<p style="text-indent:24.0pt; text-align:justify"><span style="font-size:12pt"><span style="line-height:150%"><span style="font-family:&quot;Times New Roman&quot;,serif"><span style="font-family:宋体">可以看到整体资源占用较少,</span>HLS<span style="font-family:宋体">综合得到的结果并不准确,时序和资源等参数仅能作为参考,需要使用</span>vivado<span style="font-family:宋体">进行综合才能得到更为准确的结果。综合完成后导出</span>IP<span style="font-family:宋体">,方便进行</span>vivado<span style="font-family:宋体">工程搭建。搭建完成的</span>vivado<span style="font-family:宋体">工程如下所示:</span></span></span></span></p>

<p style="text-indent:24.0pt; text-align:justify"></p>

<p style="text-indent:24.0pt; text-align:justify"><span style="font-size:12pt"><span style="line-height:150%"><span style="font-family:&quot;Times New Roman&quot;,serif"><span style="font-family:宋体">然后进行编译得到的结果如下所示:</span></span></span></span></p>

<p align="center" style="text-align:center; text-indent:24.0pt"></p>

<p style="text-indent:24.0pt; text-align:justify"><span style="font-size:12pt"><span style="line-height:150%"><span style="font-family:&quot;Times New Roman&quot;,serif"><span style="font-family:宋体">可以看到整个工程的</span>LUT<span style="font-family:宋体">占用较多,</span>DSP<span style="font-family:宋体">占用较少,</span>HLS<span style="font-family:宋体">部分还有很大的优化空间。然后将生成的</span>tcl<span style="font-family:宋体">、</span>bit<span style="font-family:宋体">和</span>hwh<span style="font-family:宋体">文件导入</span>pynq<span style="font-family:宋体">,然后打开</span>ipynb<span style="font-family:宋体">文件运行相关程序:</span></span></span></span></p>

<p style="text-indent:24.0pt; text-align:justify"><span style="font-size:12pt"><span style="line-height:150%"><span style="font-family:&quot;Times New Roman&quot;,serif"><span style="font-family:宋体">首先配置相关寄存器:</span></span></span></span></p>

<p align="center" style="text-align:center; text-indent:24.0pt"></p>

<p style="text-indent:24.0pt; text-align:justify"><span style="font-size:12pt"><span style="line-height:150%"><span style="font-family:&quot;Times New Roman&quot;,serif"><span style="font-family:宋体">寄存器的具体地址可以使用</span>vivado<span style="font-family:宋体">或者</span>SDK<span style="font-family:宋体">进行查看,然后逐层运行加速程序:</span></span></span></span></p>

<p align="center" style="text-align:center; text-indent:24.0pt"></p>

<p style="text-indent:24.0pt; text-align:justify"><span style="font-size:12pt"><span style="line-height:150%"><span style="font-family:&quot;Times New Roman&quot;,serif"><span style="font-family:宋体">最终得到识别结果:</span></span></span></span></p>

<p align="center" style="text-align:center; text-indent:24.0pt"></p>

<p style="text-indent:24.0pt; text-align:justify"><span style="font-size:12pt"><span style="line-height:150%"><span style="font-family:&quot;Times New Roman&quot;,serif"><span style="font-family:宋体">可以看到结果正确,并且进行一张图片的推理时间为</span>0.005<span style="font-family:宋体">秒左右,说明计算帧率在</span>200<span style="font-family:宋体">帧左右,效果良好。</span></span></span></span></p>

<p style="text-indent:24.0pt; text-align:justify">&nbsp;</p>

<p style="text-indent:24.0pt; text-align:justify">&nbsp;</p>

<p style="text-indent:24.0pt; text-align:justify">&nbsp;</p>

<p style="text-indent:24.0pt; text-align:justify">&nbsp;</p>

<p style="text-indent:24.0pt; text-align:justify">&nbsp;</p>

<p style="text-indent:24.0pt; text-align:justify">&nbsp;</p>
</div><script>                                        var loginstr = '<div class="locked">查看本帖全部内容,请<a href="javascript:;"   style="color:#e60000" class="loginf">登录</a>或者<a href="https://bbs.eeworld.com.cn/member.php?mod=register_eeworld.php&action=wechat" style="color:#e60000" target="_blank">注册</a></div>';
                                       
                                        if(parseInt(discuz_uid)==0){
                                                                                                (function($){
                                                        var postHeight = getTextHeight(400);
                                                        $(".showpostmsg").html($(".showpostmsg").html());
                                                        $(".showpostmsg").after(loginstr);
                                                        $(".showpostmsg").css({height:postHeight,overflow:"hidden"});
                                                })(jQuery);
                                        }                </script><script type="text/javascript">(function(d,c){var a=d.createElement("script"),m=d.getElementsByTagName("script"),eewurl="//counter.eeworld.com.cn/pv/count/";a.src=eewurl+c;m.parentNode.insertBefore(a,m)})(document,523)</script>

Jacktang 发表于 2021-3-20 21:08

<p>整个工程的LUT占用较多,DSP占用较少</p>

<p>何以见得呢</p>
页: [1]
查看完整版本: 【米尔边缘AI计算盒FZ5测评】手写数字识别——Letnet神经网络计算加速