本帖最后由 周国维 于 2024-12-6 09:05 编辑
继续书接上文,在使用inspireface完成人脸识别过后,接下来就该需要用rknn模型来进行人脸识别,但需要先了解一下目前的模型熟悉。
上次下载的Pikachu文件是一个归档文件,也就是个压缩包,在文件名后面加个.tar作为后缀名,执行tar -cvf Pikachu.tar,就能看到里面的文件
里面都是各种ai模型,__inspire__是说明文档,下面是上次使用的face_detect_320的模型属性,主要了解到类型是MNN,输出输入精度都为float32,推理平台是cpu
mnn模型是转化不成rknn模型,所以需要的是onnx模型,然后就在github翻了一会就有了
https://bj.bcebos.com/paddlehub/fastdeploy/scrfd_500m_bnkps_shape320x320.onnx
可以用netron看一下模型的算子,要保证rv1106支持这些算子,不然就得自定义算子
rknn-toolkit2/doc/05_RKNN_Compiler_Support_Operator_List_V2.3.0.pdf at master · airockchip/rknn-toolkit2
然后就是onnx to rknn,rknn-tool2环境的搭建就不再这里多言了,网上一搜一大堆,照着弄就行,下面是我的模型转化脚本
from rknn.api import RKNN
if __name__ == '__main__':
model_path="./Python/model/onnx/scrfd_500m_bnkps_shape320x320.onnx" #模型路径
rknn_path="./Python/model/scrfd_500m_bnkps_shape320x320.rknn" #模型路径
dataset_path = "./Python/model/dataset_320.txt"
rknn = RKNN(verbose=True)
print('--> Config model')
rknn.config(mean_values=[123.675, 116.28, 103.53],std_values=[58.395, 57.12, 57.375],
target_platform='rv1106', quantized_algorithm='mmse')
print('--> Loading model')
ret = rknn.load_onnx(model=model_path)
if ret != 0:
print('Load model failed!')
exit(ret)
print('--> Building model')
ret = rknn.build(do_quantization=True, dataset=dataset_path)
if ret != 0:
print('Build model failed!')
exit(ret)
print('--> Export rknn model')
ret = rknn.export_rknn(rknn_path)
if ret != 0:
print('Export rknn model failed!')
exit(ret)
mean_values和std_values可能会影响rknn模型的精度,要手动调整一下,quantized_algorithm选择mmse是为了更高的精度
dataset就比较关键了,因为模型的输入图像是320*320,所以dataset里数据集的大小最好也是320*320,不然也会影响精度,就copy了一份脚本处理下
import os
import glob
import cv2
import numpy as np
from tqdm import tqdm
import shutil
class BOX_RECT:
def __init__(self):
self.left = 0
self.right = 0
self.top = 0
self.bottom = 0
def save_jpg_paths(folder_path):
# 使用 glob 查找所有 .jpg 文件
jpg_files = glob.glob(os.path.join(folder_path, '**/*.jpg'), recursive=True)
# 获取相对路径
relative_paths = [os.path.relpath(file) for file in jpg_files]
return relative_paths
def letterbox(image, target_size, pad_color=(114, 114, 114)):
# 调整图像大小以适应目标尺寸的比例
h, w = image.shape[:2]
scale = min(target_size[0] / h, target_size[1] / w)
new_w, new_h = int(scale * w), int(scale * h)
resized_image = cv2.resize(image, (new_w, new_h))
# 计算填充大小
pad_width = target_size[1] - new_w
pad_height = target_size[0] - new_h
pads = BOX_RECT()
pads.left = pad_width // 2
pads.right = pad_width - pads.left
pads.top = pad_height // 2
pads.bottom = pad_height - pads.top
# 在图像周围添加填充
padded_image = cv2.copyMakeBorder(resized_image, pads.top, pads.bottom, pads.left, pads.right, cv2.BORDER_CONSTANT, value=pad_color)
return padded_image, pads, scale
# 示例用法
folder_path = './Python/model/lfw' # 替换为您的文件夹路径
images_list = save_jpg_paths(folder_path)
cali_num = len(images_list)
jump_num = int(len(images_list) / cali_num)
cali_list = images_list[0::jump_num][:cali_num]
input_w = 320
input_h = input_w
target_size = (input_w, input_h)
pad_color = (0, 0, 0)
save_txt = "./dataset_320.txt"
save_path = "./calib"
absolute_save_path = os.path.abspath(save_path)
# 如果保存路径存在则清空,不存在则创建
if os.path.exists(absolute_save_path):
# 使用shutil来删除目录内容
for filename in os.listdir(absolute_save_path):
file_path = os.path.join(absolute_save_path, filename)
try:
if os.path.isfile(file_path) or os.path.islink(file_path):
os.unlink(file_path)
elif os.path.isdir(file_path):
shutil.rmtree(file_path)
except Exception as e:
print(f'Failed to delete {file_path}. Reason: {e}')
else:
os.makedirs(absolute_save_path)
count = 0
with open(save_txt, "w") as f:
for img in cali_list:
count += 1
img_abs_path = img
img_suffix = os.path.splitext(img_abs_path)[-1]
img_ndarry = cv2.imread(img_abs_path)
processed_img, pads, scale = letterbox(img_ndarry, target_size, pad_color)
cv2.imwrite(os.path.join(absolute_save_path, str(count) + img_suffix), processed_img)
img_save_abs_path = os.path.join(absolute_save_path, str(count) + img_suffix)
f.write(img_save_abs_path + "\n")
if count > 30:
break
rknn-toolkit2说明文档为rknn-toolkit2/doc/03_Rockchip_RKNPU_API_Reference_RKNN_Toolkit2_V2.3.0_CN.pdf at master · airockchip/rknn-toolkit2
执行模型转换脚本,就能得到scrfd_500m_bnkps_shape320x320.rknn,一个输入通道,九个输出通道
一开始得到rknn模型文件后,我以为修改一下__inspire__说明文档,再把rknn模型也打包就归档文件就行了,
但到执行sample例程的时候噩耗来袭——inspireface不兼容rv1106。
inspireface默认rknn的输出精度类型是uint8,但是rv1106 npu的默认量化输出类型是int8,而且不支持浮点数,只支持int类型,所以必须要进行量化,才能执行转换
而且inspireface里面的rknn api函数也有部分是rv1106不支持的,这意味着rv1106只能用零拷贝的方式设置输入,获取输出
突然就觉得都没必要把inspireface移植到rv1106上,直接用rknn就能把模型量化部署在rv1106,为什么还得莫名奇妙的加个不兼容的中间层,都不知道活动主办方是不是没移植过,多此一举啊,所以还得是从头写代码。
参考rk的文档写的零拷贝流程程序
rknn_context rk_ctx_; ///< The context manager for RKNN.
rknn_input_output_num rk_io_num_; ///< The number of input and output streams in RKNN.
std::vector<rknn_tensor_attr> input_attrs_; ///< Attributes of input tensors.
std::vector<rknn_tensor_attr> output_attrs_; ///< Attributes of output tensors.
std::vector<rknn_tensor_mem > input_mems;
std::vector<rknn_tensor_mem > output_mems;
int ret = rknn_init(&rk_ctx_, (void *)model_path, 0, 0, NULL);
// INSPIRE_LOG_INFO("RKNN Init ok.");
if (ret < 0) {
printf("rknn_init fail! ret=%d\n", ret);
return -1;
}
rknn_sdk_version version;
int ret = rknn_query(rk_ctx_, RKNN_QUERY_SDK_VERSION, &version, sizeof(rknn_sdk_version));
if (ret < 0) {
printf("4 rknn_init fail! ret=%d", ret);
return -1;
}
printf("sdk version: %s driver version: %s\n", version.api_version, version.drv_version);
ret = rknn_query(rk_ctx_, RKNN_QUERY_IN_OUT_NUM, &rk_io_num_,
sizeof(rk_io_num_));
if (ret != RKNN_SUCC) {
printf("rknn_query ctx fail! ret=%d", ret);
return -1;
}
printf("models input num: %d, output num: %d\n", rk_io_num_.n_input, rk_io_num_.n_output);
// spdlog::trace("input tensors: ");
input_attrs_.resize(rk_io_num_.n_input);
output_attrs_.resize(rk_io_num_.n_output);
input_mems.resize(rk_io_num_.n_input);
output_mems.resize(rk_io_num_.n_output);
;
for (int i = 0; i < rk_io_num_.n_input; ++i) {
memset(&input_attrs_[i], 0, sizeof(input_attrs_[i]));
memset(&input_mems[i], 0, sizeof(input_mems[i]));
ret = rknn_query(rk_ctx_, RKNN_QUERY_INPUT_ATTR, &(input_attrs_[i]),
sizeof(rknn_tensor_attr));
dump_tensor_attr(&(input_attrs_[i]));
printf("input node index %d\n", i);
printf("models input height=%d, width=%d, channel=%d\n", height_, width_, channel_);
if (ret != RKNN_SUCC) {
printf("rknn_query fail! ret=%d\n", ret);
return -1;
}
input_attrs_[i].type = RKNN_TENSOR_UINT8;
input_attrs_[i].fmt = RKNN_TENSOR_NHWC;
//printf("input_attrs_[i].size_with_stride: %d\n", input_attrs_[i].size_with_stride);
rknn_tensor_mem* tmp_mems = rknn_create_mem(rk_ctx_, input_attrs_[i].size_with_stride);
memcpy(&input_mems[i], tmp_mems, sizeof(rknn_tensor_mem));
ret = rknn_set_io_mem(rk_ctx_, &input_mems[i], &input_attrs_[0]);
if (ret < 0) {
printf("input_mems rknn_set_io_mem fail! ret=%d\n", ret);
return -1;
}
}
for (int i = 0; i < rk_io_num_.n_output; ++i) {
memset(&output_attrs_[i], 0, sizeof(output_attrs_[i]));
output_attrs_[i].index = i;
ret = rknn_query(rk_ctx_, RKNN_QUERY_OUTPUT_ATTR, &(output_attrs_[i]),
sizeof(rknn_tensor_attr));
if (ret != RKNN_SUCC) {
printf("rknn_query fail! ret=%d\n", ret);
return -1;
}
dump_tensor_attr(&(output_attrs_[i]));
output_mems[i] = *(rknn_create_mem(rk_ctx_, output_attrs_[i].size_with_stride));
//printf("output_size mem [%d] = %d \n",i ,output_attrs_[i].size_with_stride);
ret = rknn_set_io_mem(rk_ctx_, &output_mems[i], &output_attrs_[i]);
if (ret < 0) {
printf("output_mems rknn_set_io_mem fail! ret=%d\n", ret);
return -1;
}
}
完成rknn配置后,就是执行模型,读取输出数据,对数据进行后处理,然后画框标记,后处理大部分程序都是参考inspireface里的cpp/inspireface/track_module/face_detect/face_detect.cpp这个程序,多了一步把int8类型的数据反量化为float32的类型
static float deqnt_affine_to_f32(int8_t qnt, int32_t zp, float scale)
{ return ((float)qnt - (float)zp) * scale; }
std::vector<FaceLoc> results;
std::vector<int> strides = {8, 16, 32};
for (int i = 0; i < strides.size(); ++i) {
std::vector<float> tensor_cls;
std::vector<float> tensor_box;
std::vector<float> tensor_lmk;
int index = i;
uint8_t *cls_tensor = (uint8_t *)(output_mems[index].virt_addr);
for(int j = 0; j < output_attrs_[index].size; ++j)
tensor_cls.push_back(deqnt_affine_to_f32(cls_tensor[j], output_attrs_[index].zp, output_attrs_[index].scale));
index = i+3;
uint8_t *box_tensor = (uint8_t *)(output_mems[index].virt_addr);
for(int j = 0; j < output_attrs_[index].size; ++j)
tensor_box.push_back(deqnt_affine_to_f32(box_tensor[j], output_attrs_[index].zp, output_attrs_[index].scale));
index = i+6;
uint8_t *lmk_tensor = (uint8_t *)(output_mems[index].virt_addr);
for(int j = 0; j < output_attrs_[index].size; ++j)
tensor_lmk.push_back(deqnt_affine_to_f32(lmk_tensor[j], output_attrs_[index].zp, output_attrs_[index].scale));
m_face_detect_._decode(tensor_cls, tensor_box, tensor_lmk, strides[i], results);
执行程序,花费了0.15秒,感觉程序再优化下应该能达到0.1s以下,和0.45s相比大致降低了四倍,也符合从float32量化到int8的运算过程
效果杠杠的
后面再看下能不能继续优化,试试减枝和蒸馏是怎样的