【玄铁杯第三届RISC-V应用创新大赛】LicheePi 4A+004曲线救国的手势识别

sipower · 发表于2023-10-31 21:20

【玄铁杯第三届RISC-V应用创新大赛】LicheePi 4A+004曲线救国的手势识别 [复制链接]

本篇介绍实现手势识别的曲折过程。在程序调试通过后，做了一个视频演示。

1、最初的方案

在我最开始的任务申请中，计划是采用mediapipe识别手势，然后分类，操控下位机在像素屏上显示不同的灯光效果。拿到开发板之前，我先在PC上把框架搭建好了，基本实现手势识别并把对应代码发给下位机，只等拿到板子移植一下就大功告成。

开发板拿到手后，仔细查阅相关文档，发现mediapipe没有基于risc-v的轮子，这个开发库自己编译也不太现实，因为用到的编译工具也没有risc-v平台的示例。

后来咨询官方技术支持，建议使用平头哥提供好的AI推理框架，也就是yolox教程里的步骤。mediapipe暂时是不用考虑了。然后，我开始了苦逼的YOLO平台学习。

2、学习YOLOv5

首先说一下为什选择YOLOv5这个版本，因为我从网上查到的做手势识别的教程，基本都是在这个版本上实现的，为了少走弯路，最好的方式是拿来主义。

2.1、开发环境安装

本着直冲目标的学习方式，先研读了网上能搜到的大部分中文资料，基本上搞明白了图像识别的训练、部署过程，虽然没搞明白具体的工作原理，但是好歹可以试着上手实操一下。

搭建开发环境还算顺利，主要遇到的问题是在CUDA适配的时候，总也不好使，不能启动GPU做运算。反反复复重装了好多次，才发现下载的驱动版本不对，按照官网指定的对应关系重新安装，终于好用了。

如下是安装好的轮子：

(venv) PS E:\TEST\Python\test> pip list
Package                 Version
----------------------- ------------
absl-py                 1.4.0       
annotated-types         0.6.0       
anyio                   3.7.1       
asttokens               2.4.0       
attrs                   23.1.0      
backcall                0.2.0       
cachetools              5.3.1       
certifi                 2023.7.22   
cffi                    1.15.1      
charset-normalizer      3.3.0       
click                   8.1.7       
colorama                0.4.6       
contourpy               1.1.0       
cycler                  0.11.0      
Cython                  3.0.4       
decorator               5.1.1       
EasyProcess             1.1         
entrypoint2             1.1         
executing               2.0.0       
fastapi                 0.104.0     
filelock                3.12.4      
flatbuffers             23.5.26     
fonttools               4.42.1      
fsspec                  2023.9.2    
google-auth             2.23.3      
google-auth-oauthlib    1.1.0       
grpcio                  1.59.0
h11                     0.14.0
idna                    3.4
ipython                 8.16.1
jedi                    0.19.1
Jinja2                  3.1.2
joblib                  1.3.2
keyboard                0.13.5
kiwisolver              1.4.5
Markdown                3.5
MarkupSafe              2.1.3
matplotlib              3.7.2
matplotlib-inline       0.1.6
mediapipe               0.10.3
MouseInfo               0.1.3
mpmath                  1.3.0
mss                     9.0.1
networkx                3.2
numpy                   1.25.2
oauthlib                3.2.2
opencv-contrib-python   4.8.0.76
opencv-python           4.8.0.76
packaging               23.1
pandas                  2.1.1
parso                   0.8.3
pickleshare             0.7.5
Pillow                  10.0.0
pip                     23.3
prompt-toolkit          3.0.39
protobuf                3.20.3
pure-eval               0.2.2
pyasn1                  0.5.0
pyasn1-modules          0.3.0
PyAutoGUI               0.9.54
pycparser               2.21
pydantic                2.4.2
pydantic_core           2.10.1
PyGetWindow             0.0.9
Pygments                2.16.1
PyMsgBox                1.0.9
pyparsing               3.0.9
pyperclip               1.8.2
PyRect                  0.2.0
pyscreenshot            3.1
PyScreeze               0.1.29
pyserial                3.5
python-dateutil         2.8.2
pytweening              1.0.7
pytz                    2023.3.post1
PyYAML                  6.0.1
requests                2.31.0
requests-oauthlib       1.3.1
rsa                     4.9
scikit-learn            1.3.1
scipy                   1.11.3
seaborn                 0.13.0
setuptools              65.5.1
six                     1.16.0
sniffio                 1.3.0
sounddevice             0.4.6
stack-data              0.6.3
starlette               0.27.0
sympy                   1.12
tensorboard             2.15.0
tensorboard-data-server 0.7.1
threadpoolctl           3.2.0
torch                   2.1.0+cu121
torchaudio              2.1.0+cu121
torchvision             0.16.0+cu121
tqdm                    4.66.1
traitlets               5.11.2
typing_extensions       4.8.0
tzdata                  2023.3
urllib3                 2.0.7
uvicorn                 0.23.2
wcwidth                 0.2.8
Werkzeug                3.0.0
wheel                   0.38.4

[notice] To update, run: python.exe -m pip install --upgrade pip
(venv) PS E:\TEST\Python\test>

2.2、数据集收集

想训练出好的模型，首先得有足够的大数据集。最开始我想自己建数据集，然后一通检索，发现如果只有很少的训练图片，训练的模型基本不能用，弄太多图片，自己一个人精力有限，做出来不太现实。想在网上找现成标注好的，发现都是收费的，而且也不知道能不能用。下图是找到的部分训练集图片，存在各种问题。

找数据集过程中，在GitHub上找到几个训练好的手势识别模型，基于此，我决定换个思路，先看看这些模型是不是好用，如果好用，直接拿来主义，还避免我这小破本跑模型训练再累死。

2.3测试现成模型

从诸多分享的模型中，筛选出两个可用的，我先测试了来自Northwestern University大学的研究生Jordan Zeeb分享的方案。他一共尝试了三种方法：

第一种方法使用openCV的基本图像阈值来寻找应用掩模后手指的轮廓。这个作为最基础的方法，受限较多，只能在特定情形下识别成功，仅作为openCV学习的初探。第二种方法使用pytorch中的机器学习原理来创建用于图像分类的CNN。这个也是应用场景受限，我实测识别率也不太理想。不过通过这个可以熟悉搭建神经网络过程。第三个使用YOLOv5的对象检测来识别美国手语中字母表的所有字母。从作者演示视频看，识别成功率还是很高的，但是我实际测试，并不是很理想。可能是因为作者自己建的数据库，样本数量和种类太少造成的，只是识别他自己比较准确。通过该作者3个演示方案，基本就能学会实际应用中手势识别的相关训练和部署了。下面是该作者的GitHub地址：

链接已隐藏，如需查看请登录或者注册

接下来测试从百度网盘找到的一个训练好的模型，地址如下：

链接已隐藏，如需查看请登录或者注册

提取码：68vu

这个模型使用datawhale组织提供的数据集，链接如下：

https://gas.graviti.cn/dataset/datawhale/HandPose

教程参考：

链接已隐藏，如需查看请登录或者注册

实测识别准确率还是不错的，具体测试代码如下：

import argparse
import time
from pathlib import Path

import cv2
import torch
import torch.backends.cudnn as cudnn
from numpy import random

from models.experimental import attempt_load
from utils.datasets import LoadStreams, LoadImages
from utils.general import check_img_size, check_requirements, check_imshow, non_max_suppression, apply_classifier, \
    scale_coords, xyxy2xywh, strip_optimizer, set_logging, increment_path
from utils.plots import plot_one_box
from utils.torch_utils import select_device, load_classifier, time_synchronized


def detect(save_img=False):
    source, weights, view_img, save_txt, imgsz = opt.source, opt.weights, opt.view_img, opt.save_txt, opt.img_size
    save_img = not opt.nosave and not source.endswith('.txt')  # save inference images
    webcam = source.isnumeric() or source.endswith('.txt') or source.lower().startswith(
        ('rtsp://', 'rtmp://', 'http://', 'https://'))

    # Directories
    save_dir = Path(increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok))  # increment run
    (save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True, exist_ok=True)  # make dir

    # Initialize
    set_logging()
    device = select_device(opt.device)
    half = device.type != 'cpu'  # half precision only supported on CUDA

    # Load model
    model = attempt_load(weights, map_location=device)  # load FP32 model
    stride = int(model.stride.max())  # model stride
    imgsz = check_img_size(imgsz, s=stride)  # check img_size
    if half:
        model.half()  # to FP16

    # Second-stage classifier
    classify = False
    if classify:
        modelc = load_classifier(name='resnet101', n=2)  # initialize
        modelc.load_state_dict(torch.load('weights/resnet101.pt', map_location=device)['model']).to(device).eval()

    # Set Dataloader
    vid_path, vid_writer = None, None
    if webcam:
        view_img = check_imshow()
        cudnn.benchmark = True  # set True to speed up constant image size inference
        dataset = LoadStreams(source, img_size=imgsz, stride=stride)
    else:
        dataset = LoadImages(source, img_size=imgsz, stride=stride)

    # Get names and colors
    names = model.module.names if hasattr(model, 'module') else model.names
    colors = [[random.randint(0, 255) for _ in range(3)] for _ in names]

    # Run inference
    if device.type != 'cpu':
        model(torch.zeros(1, 3, imgsz, imgsz).to(device).type_as(next(model.parameters())))  # run once
    t0 = time.time()
    for path, img, im0s, vid_cap in dataset:
        img = torch.from_numpy(img).to(device)
        img = img.half() if half else img.float()  # uint8 to fp16/32
        img /= 255.0  # 0 - 255 to 0.0 - 1.0
        if img.ndimension() == 3:
            img = img.unsqueeze(0)

        # Inference
        t1 = time_synchronized()
        pred = model(img, augment=opt.augment)[0]

        # Apply NMS
        pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres, classes=opt.classes, agnostic=opt.agnostic_nms)
        t2 = time_synchronized()

        # Apply Classifier
        if classify:
            pred = apply_classifier(pred, modelc, img, im0s)

        # Process detections
        for i, det in enumerate(pred):  # detections per image
            if webcam:  # batch_size >= 1
                p, s, im0, frame = path[i], '%g: ' % i, im0s[i].copy(), dataset.count
            else:
                p, s, im0, frame = path, '', im0s, getattr(dataset, 'frame', 0)

            p = Path(p)  # to Path
            save_path = str(save_dir / p.name)  # img.jpg
            txt_path = str(save_dir / 'labels' / p.stem) + ('' if dataset.mode == 'image' else f'_{frame}')  # img.txt
            s += '%gx%g ' % img.shape[2:]  # print string
            gn = torch.tensor(im0.shape)[[1, 0, 1, 0]]  # normalization gain whwh
            if len(det):
                # Rescale boxes from img_size to im0 size
                det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()

                # Print results
                for c in det[:, -1].unique():
                    n = (det[:, -1] == c).sum()  # detections per class
                    s += f"{n} {names[int(c)]}{'s' * (n > 1)}, "  # add to string

                # Write results
                for *xyxy, conf, cls in reversed(det):
                    if save_txt:  # Write to file
                        xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh
                        line = (cls, *xywh, conf) if opt.save_conf else (cls, *xywh)  # label format
                        with open(txt_path + '.txt', 'a') as f:
                            f.write(('%g ' * len(line)).rstrip() % line + '\n')

                    if save_img or view_img:  # Add bbox to image
                        label = f'{names[int(cls)]} {conf:.2f}'
                        plot_one_box(xyxy, im0, label=label, color=colors[int(cls)], line_thickness=3)

            # Print time (inference + NMS)
            print(f'{s}Done. ({t2 - t1:.3f}s)')

            # Stream results
            if view_img:
                cv2.imshow(str(p), im0)
                cv2.waitKey(1)  # 1 millisecond

            # Save results (image with detections)
            if save_img:
                if dataset.mode == 'image':
                    cv2.imwrite(save_path, im0)
                else:  # 'video' or 'stream'
                    if vid_path != save_path:  # new video
                        vid_path = save_path
                        if isinstance(vid_writer, cv2.VideoWriter):
                            vid_writer.release()  # release previous video writer
                        if vid_cap:  # video
                            fps = vid_cap.get(cv2.CAP_PROP_FPS)
                            w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
                            h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
                        else:  # stream
                            fps, w, h = 30, im0.shape[1], im0.shape[0]
                            save_path += '.mp4'
                        vid_writer = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))
                    vid_writer.write(im0)

    if save_txt or save_img:
        s = f"\n{len(list(save_dir.glob('labels/*.txt')))} labels saved to {save_dir / 'labels'}" if save_txt else ''
        print(f"Results saved to {save_dir}{s}")

    print(f'Done. ({time.time() - t0:.3f}s)')


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    # 使用什么权重
    parser.add_argument('--weights', nargs='+', type=str, default='best.pt', help='model.pt path(s)')
    # 测试的文件来源
    parser.add_argument('--source', type=str, default='0', help='source')  # file/folder, 0 for webcam
    # 测试的图像大小，这个参数和使用的model相关
    parser.add_argument('--img-size', type=int, default=640, help='inference size (pixels)')
    # 筛选候选框的置信度阈值，值越大框越少
    parser.add_argument('--conf-thres', type=float, default=0.25, help='object confidence threshold')
    # NMS舍去其他临近框的iou的阈值，越大同一个目标可能的框就越多
    parser.add_argument('--iou-thres', type=float, default=0.45, help='IOU threshold for NMS')
    # 使用的gpu编号
    parser.add_argument('--device', default='0', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
    # 是否将结果展示出来，加上 --view-img 即可
    parser.add_argument('--view-img', action='store_true', help='display results')
    # 是否保存检测结果为  *.txt
    parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
    # 是否保存置信度到本地
    parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels')
    # 不保存检测结果的txt和img到本地
    parser.add_argument('--nosave', action='store_true', help='do not save images/videos')
    # 按照某个类的label标签号，在检测结果中过滤掉这个类
    parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --class 0, or --class 0 2 3')
    #
    parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')
    parser.add_argument('--augment', action='store_true', help='augmented inference')
    parser.add_argument('--update', action='store_true', help='update all models')
    # 保存检测结果的文件夹
    parser.add_argument('--project', default='runs/detect', help='save results to project/name')
    # 每一次检测结果保存的子文件夹名
    parser.add_argument('--name', default='exp', help='save results to project/name')
    #
    parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
    opt = parser.parse_args()
    print(opt)
    with torch.no_grad():
        if opt.update:  # update all models (to fix SourceChangeWarning)
            for opt.weights in ['yolov5s.pt', 'yolov5m.pt', 'yolov5l.pt', 'yolov5x.pt']:
                detect()
                strip_optimizer(opt.weights)
        else:
            detect()

效果图如下：

这个一定要用GPU做测试，用CPU的话，感觉它着急的快要爆掉了。然后我决定就用这个现成模型完成作业。另外也尝试了一些其他模型，效果都不理想，略过不表。

2.4、模型部署

研究教程，要想把模型搞到开发板上运行，需要先在Linux里的容器中交叉编译成th1520 平台上的可执行文件，然后再导入到开发板中执行，步骤狠多。对于仅有一点微薄Linux知识的我来说，可谓步步艰辛，处处遇卡，直到我最终放弃了这个方式，因为再墨迹下去，时间就来不及了。

我的目标，还是先完成作业为先。经过多方查找资料，我采用“曲线救国”方式实现功能。具体实施分两部分：在我的小破本上用fastapi和uvicorn构建一个服务端，用来跑模型识别运算。在开发板上跑一个客户端，用来采集图片发送给服务端，然后接收服务端识别完的结果，对结果进行解析，并转化成需要的控制数据。

服务端代码如下：

import time
import torch
import torch.backends.cudnn as cudnn
from numpy import random

from models.experimental import attempt_load
from utils.datasets import LoadStreams, LoadImages
from utils.general import check_img_size, check_requirements, check_imshow, non_max_suppression, apply_classifier, \
    scale_coords, xyxy2xywh, strip_optimizer, set_logging, increment_path
from utils.plots import plot_one_box
from utils.torch_utils import select_device, load_classifier, time_synchronized
from fastapi import FastAPI, File, UploadFile
from pydantic import BaseModel
import json
import uvicorn
import cv2
import numpy as np
import io
import base64

app = FastAPI()


def letterbox(img, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):
    # Resize and pad image while meeting stride-multiple constraints
    shape = img.shape[:2]  # current shape [height, width]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)

    # Scale ratio (new / old)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
    if not scaleup:  # only scale down, do not scale up (for better test mAP)
        r = min(r, 1.0)

    # Compute padding
    ratio = r, r  # width, height ratios
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding
    if auto:  # minimum rectangle
        dw, dh = np.mod(dw, stride), np.mod(dh, stride)  # wh padding
    elif scaleFill:  # stretch
        dw, dh = 0.0, 0.0
        new_unpad = (new_shape[1], new_shape[0])
        ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # width, height ratios

    dw /= 2  # divide padding into 2 sides
    dh /= 2

    if shape[::-1] != new_unpad:  # resize
        img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
    return img, ratio, (dw, dh)


device = select_device('0')
half = device.type != 'cpu'  # half precision only supported on CUDA

# Load model
weights = './best.pt'
model = attempt_load(weights, map_location=device)  # load FP32 model
stride = int(model.stride.max())  # model stride
imgsz = check_img_size(640, s=stride)  # check img_size
if half:
    model.half()  # to FP16


def detect(source, conf_thres=0.25, iou_thres=0.45, classes=None, agnostic_nms=False):
    # img0 = cv2.imread(source)  # BGR
    img0 = source
    # Padded resize
    img = letterbox(img0, imgsz, stride=stride)[0]

    # Convert
    img = img[:, :, ::-1].transpose(2, 0, 1)  # BGR to RGB, to 3x416x416
    img = np.ascontiguousarray(img)

    # Get names and colors---获取classnamelist
    names = model.module.names if hasattr(model, 'module') else model.names

    # Run inference
    if device.type != 'cpu':
        model(torch.zeros(1, 3, imgsz, imgsz).to(device).type_as(next(model.parameters())))  # run once
    t0 = time.time()
    pred_list = []
    # print(img)
    img = torch.from_numpy(img).to(device)
    img = img.half() if half else img.float()  # uint8 to fp16/32
    img /= 255.0  # 0 - 255 to 0.0 - 1.0
    if img.ndimension() == 3:
        img = img.unsqueeze(0)

    # Inference
    pred = model(img, augment=False)[0]
    # Apply NMS
    pred = non_max_suppression(pred, conf_thres, iou_thres, classes=classes, agnostic=agnostic_nms)
    # print(pred)
    if pred:
        for p in pred:
            pred_list.append(p.tolist())
    # print(f'Done. ({time.time() - t0:.3f}s)')
    return pred_list


def base64_to_image(base64_code):
    # base64解码
    img_data = base64.b64decode(base64_code)
    # 转换为np数组
    img_array = np.fromstring(img_data, np.uint8)
    # 转换成opencv可用格式
    img = cv2.imdecode(img_array, cv2.COLOR_RGB2BGR)

    return img


class Image(BaseModel):
    img: str


@app.post('/detect')
def detect_fun(image: Image):
    img = base64_to_image(image.img)
    # img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
    pred_list = detect(source=img)
    # print(pred_list)
    # cv2.imwrite('xxx.jpg', img)
    return {'state': 'success', 'answer': pred_list}


if __name__ == '__main__':
    # fast:app 中的 fast=运行的文件名,如果修改了记得这里别忘记改
    uvicorn.run("interface_of_model:app", host="0.0.0.0", port=8000, reload=True)
    # pred_list = detect(weights='./runs/train/exp8/weights/best.pt', source="/home/zk/git_projects/hand_pose/hand_pose_yolov5_5.0/hand_pose/images/four_fingers10.jpg")
    # print(pred_list)

客户端代码如下：

import requests
import cv2
import base64
import json
import random

import serial
import struct

# Init serial port
Usart = serial.Serial(
    port="COM3",  # '/dev/ttyUSB0',  # 串口
    baudrate=115200,  # 波特率
    timeout=0.001)

# 判断串口是否打开成功
if Usart.isOpen():
    print("serial port open success")
else:
    print("serial port open failed")



def run():
    cap = cv2.VideoCapture(0)
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
    while True:
        ret, frame = cap.read()
        if not ret:
            print('cap error,system exit')
            break
        # 反转图像
        frame = cv2.flip(frame, 1)
        # 检测手位置
        hands_pos = detect(frame)
        # 在复制的图层进行表情绘制
        draw_frame = frame.copy()
        # 如果检测的人脸位置不为空才进来预测表情
        if len(hands_pos) != 0:
            # 预测人脸的表情
            draw_hands(draw_frame, hands_pos)

        # 展示视频画面
        cv2.imshow('video capture', draw_frame)
        key = cv2.waitKey(10) & 0xff
        if key == 27:  # 等待按下 ESC 键退出
            cap.release()
            cv2.destroyAllWindows()
            Usart.close()
            break

def image_to_base64(image_np):
    image = cv2.imencode('.jpg', image_np)[1]
    image_code = str(base64.b64encode(image))[2:-1]
    return image_code

def detect(frame):
    files = {'img': image_to_base64(frame)}
    # print(files)
    response = requests.post('http://127.0.0.1:8000/detect', json.dumps(files))
    # print(response.json())
    return response.json()['answer'][0]

def draw_hands(draw_frame, hands_pos):
    classes = ["four", "five", "one", "little", "ok",
               "zero", "hand", "horns", "three", "thumbup", "two"]
    # print(hands_pos)
    send_bufer = [170,0,0,0]
    for hand_pos in hands_pos:
        plot_one_box(hand_pos[:4], draw_frame, [255,0,0], label=classes[int(hand_pos[-1])], confidence=hand_pos[-2])
        if (hand_pos[-2] > 0.55):
            print(hand_pos)
            x_p = 16 * hand_pos[0] / (640 - (hand_pos[2] - hand_pos[0]))  # 映射x->16
            x_p = x_p if x_p < 16 else 15
            y_p = 16 * hand_pos[1] / (480 - (hand_pos[3] - hand_pos[1]))  # 映射y->16
            y_p = y_p if y_p < 16 else 15
            y_p = 15 - y_p

            send_bufer[0] = 170  # header = 0xAA
            send_bufer[1] = int(hand_pos[-1])
            send_bufer[2] = int(x_p)
            send_bufer[3] = int(y_p)
            print(send_bufer)
            send_data = struct.pack("%dB" % (len(send_bufer)), *send_bufer)  # 解析成16进制
            Usart.write(send_data)  # 发送
# x是 预测结果前4个
def plot_one_box(x, img, color=None, label=None, line_thickness=3, confidence=0):
    # Plots one bounding box on image img
    tl = line_thickness or round(0.002 * (img.shape[0] + img.shape[1]) / 2) + 1  # line/font thickness
    color = color or [random.randint(0, 255) for _ in range(3)]
    c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3]))
    cv2.rectangle(img, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA)
    if label:
        tf = max(tl - 1, 1)  # font thickness
        t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0]
        c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3
        cv2.rectangle(img, c1, c2, color, -1, cv2.LINE_AA)  # filled
        cv2.putText(img, label+'{}'.format(format(confidence, '.3f')), (c1[0], c1[1] - 2), 0, tl / 3, [0, 255, 0],
                    thickness=tf, lineType=cv2.LINE_AA)

if __name__ == '__main__':
    run()

像素屏使用一块基于ESP32的Arduino小板驱动，主要实现把手势在屏幕窗口中的坐标映射到16x16像素的点上，不同的手势显示不同颜色，具体代码如下：

Arduino代码：

#define FASTLED_ALL_PINS_HARDWARE_SPI
#include <FastLED.h>

#define LED_PIN  23
#define COLOR_ORDER GRB
#define CHIPSET     WS2812B
//
// Mark's xy coordinate mapping code.  See the XYMatrix for more information on it.
//

// Params for width and height
const uint8_t kMatrixWidth = 16;
const uint8_t kMatrixHeight = 16;

#define MAX_DIMENSION ((kMatrixWidth>kMatrixHeight) ? kMatrixWidth : kMatrixHeight)
#define NUM_LEDS (kMatrixWidth * kMatrixHeight)

// Param for different pixel layouts
const bool    kMatrixSerpentineLayout = true;


uint8_t key_p = 0;
uint8_t key_x = 0;
uint8_t key_y = 0;
uint8_t key_flag = 0;

uint16_t XY( uint8_t x, uint8_t y)
{
  uint16_t i;

  if( kMatrixSerpentineLayout == false) {
    i = (y * kMatrixWidth) + x;
  }

  if( kMatrixSerpentineLayout == true) {
    if( y & 0x01) {
      // Odd rows run backwards
      uint8_t reverseX = (kMatrixWidth - 1) - x;
      i = (y * kMatrixWidth) + reverseX;
    } else {
      // Even rows run forwards
      i = (y * kMatrixWidth) + x;
    }
  }

  return i;
}

// The leds
CRGB leds[kMatrixWidth * kMatrixHeight];

void setup() {
  // uncomment the following lines if you want to see FPS count information
  Serial.begin(115200);
  Serial.println("resetting!");
  delay(3000);
  FastLED.addLeds<CHIPSET, LED_PIN, COLOR_ORDER>(leds, NUM_LEDS).setCorrection(TypicalSMD5050);
  FastLED.setBrightness(96);
}


void Serial_recive()
{
  uint8_t rcv_buff[4];
  uint8_t rcv_flag = 0,rcv_num = 0;

  while(Serial.available()>0){
    if(rcv_flag == 0)
    {
      rcv_buff[0] = Serial.read();
      if(rcv_buff[0] == 0xAA)
      {
        rcv_flag = 1;
        rcv_num = 1;
      }
    }
    else
    {
      rcv_buff[rcv_num] = Serial.read();
      rcv_num++;
      if(rcv_num > 3)
      {
        rcv_flag = 0;
        key_p = rcv_buff[1];
        key_x = rcv_buff[2];
        key_y = rcv_buff[3];
        key_flag = 1;
      }
    }
  }
}


void loop() {
  uint8_t ihue=0;
  char buff[50];

  Serial_recive();
  if(key_flag == 1)
  {
    key_flag = 0;
    
    ihue = key_p*22;
    FastLED.clearData();
    leds[XY(key_x,key_y)] = CHSV(ihue,255,250);

    FastLED.show();
    sprintf(buff, "h=%d",ihue);
    Serial.println(buff);
  }
  //delay(10);
}

整体运行起来，演示视频如下：

demo

3、总结

由于本人水平有限，目前只能通过“曲线救国”方式实现手势识别功能。等官方或论坛里的实例更多一些，我再尝试把运算部分移植到开发板上。

jobszheng5

真棒

看上去识别的准确度还是蛮高的

【玄铁杯第三届RISC-V应用创新大赛】LicheePi 4A+004曲线救国的手势识别 [复制链接]

1、最初的方案

2、学习YOLOv5

2.1、开发环境安装

2.2、数据集收集

2.3测试现成模型

2.4、模型部署

3、总结

最新回复