感受野(Receptive Field):图像经过不同卷积核和级连的卷积层,输出的特征图中的每个元素,都对应于原图中的特定区域。
交并比(IoU,Intersection over Union):在目标检测中,指两个锚框,相交的面积比相并的面积的值,通常用于描述锚框的定位精度。
AP50、AP75、AP95和mAP(mean Average Precision):在目标检测中,以上APxx指标都用于描述模型的检测性能。
在torchvision中提供了Faster R-CNN,FCOS,RetinaNet,SSD和SSDlite等多个成熟的目标检测模型,每个模型包含一个或多个改进版本,在速度和精度上有所区别,并且所有的模型都提供在COCO数据集上的训练的参数,可以在创建时加载,可直接用于目标检测。所有的目标检测模型位于torchvision.models.detection子包中。
Torchvision中的预训练模型 直接观察输出结果并不直观,需要进行一些后处理来更好的展示。后处理主要包括:
import torchvision.transforms.functional as F
from torchvision.models import detection
import numpy as np
from torchvision.io import read_image,ImageReadMode
import torch as tc
import visdom
from PIL import Image, ImageDraw,ImageFont
'__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',
'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',
'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',
'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
colormap=[[ 0, 0, 0], # 0=background
[128, 0, 0], # 1=aeroplane
[ 0, 128, 0], #2=bicycle
[128, 128, 0], #3=bird
[ 0, 0, 128], #4=boat
[128, 0, 128], #5=bottle
[ 0, 128, 128], # 6=bus
[128, 128, 128], #7=car
[ 64, 0, 0], #8=cat
[192, 0, 0], #9=chair
[ 64, 128, 0], #10=cow
[192, 128, 0], #11=dining table
[ 64, 0, 128], #12=dog
[192, 0, 128], #13=horse
[ 64, 128, 128], #14=motorbike
[192, 128, 128], #15=person
[ 0, 64, 0], #16=potted plant
[128, 64, 0], #17=sheep
[ 0, 192, 0], #18=sofa
[128, 192, 0], #19=train
[ 0, 64, 128], #20=tv/monitor
[128, 64, 128], #nodefined
[ 0, 192, 128],
[128, 192, 128],
[ 64, 64, 0],
[192, 64, 0],
[ 64, 192, 0],
[192, 192, 0],
[ 64, 64, 128],
[192, 64, 128],
[ 64, 192, 128],
[192, 192, 128],
[ 0, 0, 64],
[128, 0, 64],
[ 0, 128, 64],
[128, 128, 64],
[ 0, 0, 192],
[128, 0, 192],
[ 0, 128, 192],
[128, 128, 192],
[ 64, 0, 64],
[192, 0, 64],
[ 64, 128, 64],
[192, 128, 64],
[ 64, 0, 192],
[192, 0, 192],
[ 64, 128, 192],
[192, 128, 192],
[ 0, 64, 64],
[128, 64, 64],
[ 0, 192, 64],
[128, 192, 64],
[ 0, 64, 192],
[128, 64, 192],
[ 0, 192, 192],
[128, 192, 192],
[ 64, 64, 64],
[192, 64, 64],
[ 64, 192, 64],
[192, 192, 64],
[ 64, 64, 192],
[192, 64, 192],
[ 64, 192, 192],
[192, 192, 192],
[ 32, 0, 0],
[160, 0, 0],
[ 32, 128, 0],
[160, 128, 0],
[ 32, 0, 128],
[160, 0, 128],
[ 32, 128, 128],
[160, 128, 128],
[ 96, 0, 0],
[224, 0, 0],
[ 96, 128, 0],
[224, 128, 0],
[ 96, 0, 128],
[224, 0, 128],
[ 96, 128, 128],
[224, 128, 128],
[ 32, 64, 0],
[160, 64, 0],
[ 32, 192, 0],
[160, 192, 0],
[ 32, 64, 128],
[160, 64, 128],
[ 32, 192, 128],
[160, 192, 128],
[ 96, 64, 0],
[224, 64, 0],
[ 96, 192, 0],
[224, 192, 0],
[ 96, 64, 128],
[224, 64, 128],
[ 96, 192, 128],
[224, 192, 128],
[ 32, 0, 64],
[160, 0, 64],
[ 32, 128, 64],
[160, 128, 64],
[ 32, 0, 192],
[160, 0, 192],
[ 32, 128, 192],
[160, 128, 192],
[ 96, 0, 64],
[224, 0, 64],
[ 96, 128, 64],
[224, 128, 64],
[ 96, 0, 192],
[224, 0, 192],
[ 96, 128, 192],
[224, 128, 192],
[ 32, 64, 64],
[160, 64, 64],
[ 32, 192, 64],
[160, 192, 64],
[ 32, 64, 192],
[160, 64, 192],
[ 32, 192, 192],
[160, 192, 192],
[ 96, 64, 64],
[224, 64, 64],
[ 96, 192, 64],
[224, 192, 64],
[ 96, 64, 192],
[224, 64, 192],
[ 96, 192, 192],
[224, 192, 192],
[ 0, 32, 0],
[128, 32, 0],
[ 0, 160, 0],
[128, 160, 0],
[ 0, 32, 128],
[128, 32, 128],
[ 0, 160, 128],
[128, 160, 128],
[ 64, 32, 0],
[192, 32, 0],
[ 64, 160, 0],
[192, 160, 0],
[ 64, 32, 128],
[192, 32, 128],
[ 64, 160, 128],
[192, 160, 128],
[ 0, 96, 0],
[128, 96, 0],
[ 0, 224, 0],
[128, 224, 0],
[ 0, 96, 128],
[128, 96, 128],
[ 0, 224, 128],
[128, 224, 128],
[ 64, 96, 0],
[192, 96, 0],
[ 64, 224, 0],
[192, 224, 0],
[ 64, 96, 128],
[192, 96, 128],
[ 64, 224, 128],
[192, 224, 128],
[ 0, 32, 64],
[128, 32, 64],
[ 0, 160, 64],
[128, 160, 64],
[ 0, 32, 192],
[128, 32, 192],
[ 0, 160, 192],
[128, 160, 192],
[ 64, 32, 64],
[192, 32, 64],
[ 64, 160, 64],
[192, 160, 64],
[ 64, 32, 192],
[192, 32, 192],
[ 64, 160, 192],
[192, 160, 192],
[ 0, 96, 64],
[128, 96, 64],
[ 0, 224, 64],
[128, 224, 64],
[ 0, 96, 192],
[128, 96, 192],
[ 0, 224, 192],
[128, 224, 192],
[ 64, 96, 64],
[192, 96, 64],
[ 64, 224, 64],
[192, 224, 64],
[ 64, 96, 192],
[192, 96, 192],
[ 64, 224, 192],
[192, 224, 192],
[ 32, 32, 0],
[160, 32, 0],
[ 32, 160, 0],
[160, 160, 0],
[ 32, 32, 128],
[160, 32, 128],
[ 32, 160, 128],
[160, 160, 128],
[ 96, 32, 0],
[224, 32, 0],
[ 96, 160, 0],
[224, 160, 0],
[ 96, 32, 128],
[224, 32, 128],
[ 96, 160, 128],
[224, 160, 128],
[ 32, 96, 0],
[160, 96, 0],
[ 32, 224, 0],
[160, 224, 0],
[ 32, 96, 128],
[160, 96, 128],
[ 32, 224, 128],
[160, 224, 128],
[ 96, 96, 0],
[224, 96, 0],
[ 96, 224, 0],
[224, 224, 0],
[ 96, 96, 128],
[224, 96, 128],
[ 96, 224, 128],
[224, 224, 128],
[ 32, 32, 64],
[160, 32, 64],
[ 32, 160, 64],
[160, 160, 64],
[ 32, 32, 192],
[160, 32, 192],
[ 32, 160, 192],
[160, 160, 192],
[ 96, 32, 64],
[224, 32, 64],
[ 96, 160, 64],
[224, 160, 64],
[ 96, 32, 192],
[224, 32, 192],
[ 96, 160, 192],
[224, 160, 192],
[ 32, 96, 64],
[160, 96, 64],
[ 32, 224, 64],
[160, 224, 64],
[ 32, 96, 192],
[160, 96, 192],
[ 32, 224, 192],
[160, 224, 192],
[ 96, 96, 64],
[224, 96, 64],
[ 96, 224, 64],
[224, 224, 64],
[ 96, 96, 192],
[224, 96, 192],
[ 96, 224, 192],
[224, 224, 192]]
def draw_bounding_boxes(imgchwtensor, boxestensor, labels=['Object'], colors= ['red'], fill= False, width=1, font= 'simhei', font_size= 12):
#imgchwtensor is a cxhxw tensor represent a image , boxestensor is nx4 represent the box on image lx,ly,rx,ry
x = (imgchwtensor*(255 if imgchwtensor.max()<=1 else 1)).byte()
image = Image.fromarray(x.permute(1,2,0).numpy())
draw = ImageDraw.Draw(image)
except Exception:
font = ImageFont.load_default()
colors=colors*numofbox if len(colors)==1 else colors
labels=labels*numofbox if len(labels)==1 else labels
for i in range(numofbox):
draw.rectangle(xy=box, fill=None ,outline=colors[i],width=width) # box
if label != '':
draw.rectangle(box, width=width, outline=colors[i]) # box
_,_, w, h = font.getbbox(label) # text width, height
outside = box[1] - h >= 0 # label fits outside box
(box[0], box[1] - h if outside else box[1], box[0] + w + 1,
box[1] + 1 if outside else box[1] + h + 1),
draw.text((box[0], box[1] - h if outside else box[1]), label, fill=textcolor, font=font)
return np.asarray(image).transpose((2,0,1))
#model = retinanet_resnet50_fpn(weights=RetinaNet_ResNet50_FPN_Weights.DEFAULT)
img = F.convert_image_dtype(imgorg, tc.float)
with tc.no_grad():
detection_outputs = model([img,])
threhold=0.7 #保留类别概率大于0.7的检测结果
labelnames=[ COCO_INSTANCE_CATEGORY_NAMES[label]+f'({scores[idx]:.2f})' for idx,label in enumerate(labels) ]
colors=[ tuple(colormap[i]) for i in labels]