AI挑战营(进阶) 五：实现多张人脸的注册和识别

zhuxirui 发表于 2024-12-28 13:20

本帖最后由 zhuxirui 于 2024-12-29 15:51 编辑

<div>
<div>本次节主要就是分享一下基于<a href="https://github.com/LuckfoxTECH/luckfox_pico_rknn_example/tree/kernel-5.10.160/example/luckfox_pico_retinaface_facenet/src" target="_blank">官方例程</a>修改的代码，可以实现同时检测多张人脸，并且将结果输出到控制台，接下来还是简单地讲讲修改思路吧。</div>

<div>首先是facenet和arcface模型的输出和输入大小有区别，facenet的输入是160x160的图像输出是大小为128的特征向量而arcface则是112x112输入，512的输出，所以我们需要修改定义的模型输入尺寸和输出向量的数组长度。</div>

<div>接着是添加注册多张人脸，主要就是输入选定人脸文件夹下的路径并推理一次得到每一张人脸的特征向量储存下来，同时以他们的文件名作为人名存入人名集。通过摄像头采集图像逐帧输入到retinaface并得到可能的的人脸框后送入arcface检测，并且遍历人脸特征库，计算欧氏距离最小值，如果最小值小于阈值就记录下来，直到检测完一帧中的每一张人脸后输出结果。精度得分的话就是score = 1/(1+distance(x<sub>0</sub>,x<sub>min</sub>))，比较简单粗暴，测试下来一般的distance<1.2就可以认为是同一个人了，所以score>0.45就可以认为识别到了人脸库的对应人脸，</div>

<div>修改后的main.cc代码如下：</div>

<div>
<pre>
<code class="language-cpp">// Copyright (c) 2023 by Rockchip Electronics Co., Ltd. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

/*-------------------------------------------
            Includes
-------------------------------------------*/
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <linux/fb.h>

#include <iomanip>
#include <cstring>
#include <dirent.h>
#include <libgen.h>
#include <stdlib.h>
#include <vector>
#include <utility>
#include <algorithm>
#include <iostream>
#include <unordered_set>
#include "retinaface_facenet.h"
#include <time.h>
#include <sys/time.h>

#include "dma_alloc.cpp"

#define USE_DMA 0

/*-------------------------------------------
               Main Function
-------------------------------------------*/
int main(int argc,char **argv)
{

//if (argc != 4)
//{
// printf("%s <retinaface model_path> <facenet model_path> <reference pic_path> \n", argv);
// return -1;
//}
system("RkLunch-stop.sh");

//const char *model_path= argv;
//const char *model_path2 = argv;
//const char *image_path= argv;
if (argc != 3) {
   std::cerr << "Usage: " << argv << " <folder_path>\n";
   return -1;
}
float set_dis = atof(argv);
// 获取命令行参数中的文件夹路径
char* folder_path = argv;

struct stat path_stat;
if (stat(folder_path, &path_stat) != 0 || !S_ISDIR(path_stat.st_mode)) {
   std::cerr << "The provided path is not a valid directory." << std::endl;
   return -1;
}

// 打开目录
DIR* dir = opendir(folder_path);
if (dir == nullptr) {
   std::cerr << "Failed to open directory." << std::endl;
   return -1;
}

// 遍历文件夹并获取文件路径和文件名（不含后缀）
std::vector<std::string> file_paths;
std::vector<std::string> file_names_without_extension;
struct dirent* entry;
while ((entry = readdir(dir)) != nullptr) {
   if (entry->d_type == DT_REG) { // 普通文件
         std::string file_path = std::string(folder_path) + "/" + entry->d_name;
         file_paths.push_back(file_path);

         // 获取不含后缀的文件名
         char* base_name = basename(entry->d_name);
         std::string name_without_extension = base_name;
         size_t last_dot_pos = name_without_extension.find_last_of('.');
         if (last_dot_pos != std::string::npos) {
            name_without_extension = name_without_extension.substr(0, last_dot_pos);
         }
         file_names_without_extension.push_back(name_without_extension);
   }
}
closedir(dir);

// 文件数量
size_t face_target = file_paths.size();

// 创建一个char**数组来存储文件路径
char** image_path_collection = new char*;
for (size_t i = 0; i < face_target; ++i) {
   image_path_collection = new char.size() + 1];
   std::strcpy(image_path_collection, file_paths.c_str());
}

// 创建一个char**数组来存储文件名（不含后缀）
char** name_collection = new char*;
for (size_t i = 0; i < face_target; ++i) {
   name_collection = new char.size() + 1];
   std::strcpy(name_collection, file_names_without_extension.c_str());
}
char *model_path= "./model/retinaface.rknn";
char *model_path2 = "./model/arcface.rknn";
char *image_path= "./model/test.jpg";
float* out_fp32 = (float*)malloc(sizeof(float) * 512); // 每个 out_fp32 存储 512 个 float
float** out_fp32_collection = (float**)malloc(sizeof(float*) * face_target); // out_fp32_collection
clock_t start_time;
clock_t end_time;

//Model Input
//Retinaface
int retina_width = 640;
int retina_height = 640;
//Facenet
int facenet_width = 112;
int facenet_height= 112;
int channels = 3;

int ret;
rknn_app_context_t app_retinaface_ctx;
rknn_app_context_t app_facenet_ctx;
object_detect_result_list od_results;

memset(&app_retinaface_ctx, 0, sizeof(rknn_app_context_t));
memset(&app_facenet_ctx, 0, sizeof(rknn_app_context_t));

//Init Model
init_retinaface_facenet_model(model_path, model_path2, &app_retinaface_ctx, &app_facenet_ctx);

//Init fb
int disp_flag = 0;
int pixel_size = 0;
size_t screensize = 0;
int disp_width= 0;
int disp_height = 0;
void* framebuffer = NULL;
struct fb_fix_screeninfo fb_fix;
struct fb_var_screeninfo fb_var;

int framebuffer_fd = 0; //for DMA
cv::Mat disp;

int fb = open("/dev/fb0", O_RDWR);
if(fb == -1)
   printf("Screen OFF!\n");
else
   disp_flag = 1;

if(disp_flag){
   ioctl(fb, FBIOGET_VSCREENINFO, &fb_var);
   ioctl(fb, FBIOGET_FSCREENINFO, &fb_fix);

   disp_width = fb_var.xres;
   disp_height = fb_var.yres;
   pixel_size = fb_var.bits_per_pixel / 8;
   printf("Screen width = %d, Screen height = %d, Pixel_size = %d\n",disp_width, disp_height, pixel_size);

   screensize = disp_width * disp_height * pixel_size;
   framebuffer = (uint8_t*)mmap(NULL, screensize, PROT_READ | PROT_WRITE, MAP_SHARED, fb, 0);

   if( pixel_size == 4 )//ARGB8888
         disp = cv::Mat(disp_height, disp_width, CV_8UC3);
   else if ( pixel_size == 2 ) //RGB565
         disp = cv::Mat(disp_height, disp_width, CV_16UC1);

#if USE_DMA
   dma_buf_alloc(RV1106_CMA_HEAP_PATH,
                  disp_width * disp_height * pixel_size,
                  &framebuffer_fd,
                  (void **) & (disp.data));
#endif
}
else{
   disp_height = 240;
   disp_width = 240;
}

//Init Opencv-mobile
cv::VideoCapture cap;
cv::Mat bgr(disp_height, disp_width, CV_8UC3);
cv::Mat retina_input(retina_height, retina_width, CV_8UC3, app_retinaface_ctx.input_mems->virt_addr);
cap.set(cv::CAP_PROP_FRAME_WIDTH,disp_width);
cap.set(cv::CAP_PROP_FRAME_HEIGHT, disp_height);
cap.open(0);

//Get referencve img feature
cv::Mat image = cv::imread(image_path);
cv::Mat facenet_input(facenet_height, facenet_width, CV_8UC3, app_facenet_ctx.input_mems->virt_addr);
letterbox(image,facenet_input);
ret = rknn_run(app_facenet_ctx.rknn_ctx, nullptr);
if (ret < 0) {
   printf("rknn_run fail! ret=%d\n", ret);
   return -1;
}
uint8_t*output = (uint8_t *)(app_facenet_ctx.output_mems->virt_addr);
float* reference_out_fp32 = (float*)malloc(sizeof(float) * 512);
//output_normalization(&app_facenet_ctx,output,reference_out_fp32);
//memset(facenet_input.data, 0, facenet_width * facenet_height * channels);

//float* out_fp32 = (float*)malloc(sizeof(float) * 128);

for(int i = 0; i < face_target; i++){
   cv::Mat image = cv::imread(image_path_collection);
   cv::Mat facenet_input(facenet_height, facenet_width, CV_8UC3, app_facenet_ctx.input_mems->virt_addr);
   letterbox(image,facenet_input);
   ret = rknn_run(app_facenet_ctx.rknn_ctx, nullptr);
if (ret < 0) {
   printf("rknn_run fail! ret=%d\n", ret);
   return -1;
}
   uint8_t*output = (uint8_t *)(app_facenet_ctx.output_mems->virt_addr);
   float* reference_out_fp32 = (float*)malloc(sizeof(float) * 512);
   output_normalization(&app_facenet_ctx,output,reference_out_fp32);
   out_fp32_collection = reference_out_fp32;
}

//

char show_text;
char fps_text;
float fps = 0;

while(1)
{
   start_time = clock();
   //opencv get photo
   cap >> bgr;

   cv::resize(bgr, retina_input, cv::Size(retina_width,retina_height), 0, 0, cv::INTER_LINEAR);
   ret = inference_retinaface_model(&app_retinaface_ctx, &od_results);
   if (ret != 0)
   {
         printf("init_retinaface_model fail! ret=%d\n", ret);
         return -1;
   }
   //printf("running------------------\n");
   for (int i = 0; i < od_results.count; i++)
   {
         //Get det
         object_detect_result *det_result = &(od_results.results);
         mapCoordinates(bgr, retina_input, &det_result->box.left , &det_result->box.top);
         mapCoordinates(bgr, retina_input, &det_result->box.right, &det_result->box.bottom);

         cv::rectangle(bgr,cv::Point(det_result->box.left ,det_result->box.top),
                     cv::Point(det_result->box.right,det_result->box.bottom),cv::Scalar(0,255,0),3);

         //Face capture
         cv::Rect roi(det_result->box.left,det_result->box.top,
                     (det_result->box.right - det_result->box.left),
                     (det_result->box.bottom - det_result->box.top));
         cv::Mat face_img = bgr(roi);

         //Give five key points
         // for(int j = 0; j < 5;j ++)
         // {
         // //printf("point_x = %d point_y = %d\n",det_result->point.x,
         // //                                  det_result->point.y);
         // cv::circle(bgr,cv::Point(det_result->point.x,det_result->point.y),10,cv::Scalar(0,255,0),3);
         // }

         letterbox(face_img,facenet_input);
         ret = rknn_run(app_facenet_ctx.rknn_ctx, nullptr);
         if (ret < 0) {
            printf("rknn_run fail! ret=%d\n", ret);
            return -1;
         }
         output = (uint8_t *)(app_facenet_ctx.output_mems->virt_addr);
         float *norm_list = (float*)malloc(sizeof(float) * face_target);
         output_normalization(&app_facenet_ctx, output, out_fp32);

         std::vector<std::pair<float, int>> distances; // 存储距离和对应的标签
         std::vector<std::pair<std::string, float>> detected_names_with_scores; // 存储人名和精度得分
         std::unordered_set<int> matched_indices; // 用于记录已匹配的标签索引

         // 计算所有距离
         for (int i = 0; i < face_target; i++) {
            float distance = get_duclidean_distance(out_fp32, out_fp32_collection);
            distances.push_back(std::make_pair(distance, i));
         }

         // 根据距离对标签进行排序，最近的标签在最前面
         std::sort(distances.begin(), distances.end());

         // 遍历排序后的距离列表，找到所有小于阈值的匹配标签
         for (const auto& pair : distances) {
            if (pair.first < set_dis) {
               // 确保每个标签只匹配一次
               if (matched_indices.find(pair.second) == matched_indices.end()) {
                     // 计算精度得分，距离越小，得分越高
                     float score = 1.0f / (1.0f + pair.first);// 精度得分公式
                     detected_names_with_scores.emplace_back(name_collection, score);
                     matched_indices.insert(pair.second); // 记录已经匹配的标签
               }
            }
         }

         // 构建最终的检测到的人名和精度得分字符串
         std::ostringstream name_detected_stream;
         for (size_t i = 0; i < detected_names_with_scores.size(); i++) {
            if (i > 0) {
               name_detected_stream << ", ";
            }
            // 使用 std::fixed 和 std::setprecision 来确保精度得分有两位小数
            name_detected_stream << detected_names_with_scores.first
                                 << " (" << std::fixed << std::setprecision(2) << detected_names_with_scores.second << ")";
         }
         std::string name_detected = name_detected_stream.str();

         // 打印检测到的人名和精度得分
         if (!name_detected.empty()) {
            printf("detected: %s\n", name_detected.c_str());
         }

         // 释放内存
         free(norm_list);

         //sprintf(show_text,"norm=%f",norm);
         //cv::putText(bgr, show_text, cv::Point(det_result->box.left, det_result->box.top - 8),
         //                            cv::FONT_HERSHEY_SIMPLEX,0.5,
         //                            cv::Scalar(0,255,0),
         //                            1);

   }

   if(disp_flag){
         //Fps Show
         sprintf(fps_text,"fps=%.1f",fps);
         cv::putText(bgr,fps_text,cv::Point(0, 20),
                     cv::FONT_HERSHEY_SIMPLEX,0.5,
                     cv::Scalar(0,255,0),1);

         //LCD Show
         if( pixel_size == 4 )
            cv::cvtColor(bgr, disp, cv::COLOR_BGR2BGRA);
         else if( pixel_size == 2 )
            cv::cvtColor(bgr, disp, cv::COLOR_BGR2BGR565);
         memcpy(framebuffer, disp.data, disp_width * disp_height * pixel_size);
#if USE_DMA
         dma_sync_cpu_to_device(framebuffer_fd);
#endif
   }
   //Update Fps
   end_time = clock();
   fps = ((float)CLOCKS_PER_SEC / (end_time - start_time)) ;
}

free(reference_out_fp32);
free(out_fp32);

if(disp_flag){
   close(fb);
   munmap(framebuffer, screensize);
#if USE_DMA
   dma_buf_free(disp_width*disp_height*2,
                  &framebuffer_fd,
                  bgr.data);
#endif
}

release_facenet_model(&app_facenet_ctx);
release_retinaface_model(&app_retinaface_ctx);

return 0;
}
</code></pre>

<p> </p>
</div>

<div>编译后将install/luckfox_pico_retinaface_facenet_demo的内容上传到开发板</div>

<div> </div>

<div>
<pre>
<code class="language-bash">chmod 777 ./luckfox_pico_retinaface_facenet
./luckfox_pico_retinaface_facenet ./test/person 1.2</code></pre>

<p>其中 ./test/person路径下存放要注册的人脸， 1.2为设置的阈值</p>
</div>

<div> </div>

<div>结果展示</div>

<div>c52c1c740df5df0876079cb7204ba097<br />
 </div>

<div>按照官方rkmpi的例程，应该还能实现rtsp推理查看推理结果，并且画框标注，但是略复杂一些，看看后面有没有时间完善一下<img height="50" src="https://bbs.eeworld.com.cn/static/editor/plugins/hkemoji/sticker/facebook/wanwan21.gif" width="63" /></div>

<div> </div>

<div>就结果来看感觉效果还有提升空间，之前自己在python基于onnx的测试的时候正确的distance基本都在0.8左右也就是得分基本在0.5以上，大概率是因为int8量化带来的精度损失。在一些复杂场景下人脸识别结果可能不太理想，但单人的检测还是能胜任的。</div>

<div> </div>

<div>下一节我们将结合LuckFox pico的引脚I/O资源实现我们的智能门禁抓拍系统</div>
</div>

<p></p>

wangerxian 发表于 2024-12-30 13:52

<p>现在多个人脸检测需求已经很常见了，我看识别速率还行还不错。</p>

页: [1]

电子工程世界-论坛's Archiver

AI挑战营(进阶) 五：实现多张人脸的注册和识别