RetinaFace(人脸检测/PyTorch)
【摘要】 RetinaFace(人脸检测/PyTorch)RetinaFace是一个强大的单阶段人脸检测模型,它利用联合监督和自我监督的多任务学习,在各种人脸尺度上执行像素方面的人脸定位。本案例是RetinaFace论文复现的体验案例,此模型基于RetinaFace: Single-stage Dense Face Localisation in the Wild中提出的模型结构实现,该算法会载入在W...
RetinaFace(人脸检测/PyTorch)
RetinaFace是一个强大的单阶段人脸检测模型,它利用联合监督和自我监督的多任务学习,在各种人脸尺度上执行像素方面的人脸定位。
本案例是RetinaFace论文复现的体验案例,此模型基于RetinaFace: Single-stage Dense Face Localisation in the Wild中提出的模型结构实现,该算法会载入在WiderFace 上的预训练模型,在用户数据集上做迁移学习。我们提供了训练代码和可用于训练的模型,用于实际场景的微调训练。
注意事项:
1.本案例使用框架:PyTorch1.4.0
2.本案例使用硬件:GPU: 1*NVIDIA-V100NV32(32GB) | CPU: 8 核 64GB
3.运行代码方法: 点击本页面顶部菜单栏的三角形运行按钮或按Ctrl+Enter键 运行每个方块中的代码
4.JupyterLab的详细用法: [请参考《ModelAtrs JupyterLab使用指导》](https://bbs.huaweicloud.com/forum/thread-97603-1-1.html)
5.碰到问题的解决办法**:** [请参考《ModelAtrs JupyterLab常见问题解决办法》](https://bbs.huaweicloud.com/forum/thread-98681-1-1.html)
1.下载数据和代码
运行下面代码,进行数据和代码的下载和解压
本案例使用WIDER人脸数据集。
import os
# 数据代码下载
!wget https://obs-aigallery-zc.obs.cn-north-4.myhuaweicloud.com/algorithm/RetinaFace.zip
# 解压缩
os.system('unzip RetinaFace.zip -d ./')
--2021-06-25 15:19:18-- https://obs-aigallery-zc.obs.cn-north-4.myhuaweicloud.com/algorithm/RetinaFace.zip
Resolving proxy-notebook.modelarts.com (proxy-notebook.modelarts.com)... 192.168.6.62
Connecting to proxy-notebook.modelarts.com (proxy-notebook.modelarts.com)|192.168.6.62|:8083... connected.
Proxy request sent, awaiting response... 200 OK
Length: 1997846711 (1.9G) [application/zip]
Saving to: ‘RetinaFace.zip’
RetinaFace.zip 100%[===================>] 1.86G 177MB/s in 13s
2021-06-25 15:19:31 (149 MB/s) - ‘RetinaFace.zip’ saved [1997846711/1997846711]
0
2.模型训练
2.1依赖库加载
可能会耗时几分钟,请耐心等待
from __future__ import print_function
import os
root_path = './RetinaFace/'
os.chdir(root_path)
import torch
import torch.optim as optim
import torch.backends.cudnn as cudnn
import argparse
import torch.utils.data as data
from data import WiderFaceDetection, detection_collate, preproc, cfg_mnet, cfg_re50,cfg_re152
from layers.modules import MultiBoxLoss
from layers.functions.prior_box import PriorBox
import time
import datetime
import math
from models.retinaface import RetinaFace
from eval import eval_run
from eval_standard import run_eval_standard
from widerface_evaluate.evaluation import evaluation
===================Install cython_bbox successful========================
2.2参数设置及依赖库安装
安装依赖库会花费几分钟的时间,请耐心等待
parser = argparse.ArgumentParser(description='Retinaface Training')
parser.add_argument('--data_url', default='./WIDER_train/', help='Training dataset directory')
parser.add_argument('--train_url', default='./output/', help='Location to save checkpoint models')
parser.add_argument('--data_format', type=str, default="zip", help='zip or dir')
parser.add_argument('--network', default='resnet50', help='Backbone network mobile0.25 , resnet50 or resnet152')
parser.add_argument('--num_workers', default=1, type=int, help='Number of workers used in dataloading')
parser.add_argument('--lr', '--learning-rate', default=1e-3, type=float, help='initial learning rate')
parser.add_argument('--momentum', default=0.9, type=float, help='momentum')
parser.add_argument('--load_weight', default='weight/best_model.pth', help='resume net for retraining')
parser.add_argument('--resume_epoch', default=0, type=int, help='resume iter for retraining')
parser.add_argument('--weight_decay', default=5e-4, type=float, help='Weight decay for SGD')
parser.add_argument('--gamma', default=0.1, type=float, help='Gamma update for SGD')
parser.add_argument('--img_size', default=1024, type=int,help='image size')
parser.add_argument('--test_origin_size', default=False, help='Whether use origin image size to evaluate')
parser.add_argument('--confidence_threshold', default=0.02, type=float, help='confidence_threshold')
parser.add_argument('--nms_threshold', default=0.4, type=float, help='nms_threshold')
parser.add_argument('--gpu_train', default=True, type=bool, help='gpu or cpu train')
parser.add_argument('--num_gpu', default=1, type=int, help='if 1,use one gpu,is more than 1,use all gpus')
parser.add_argument('--batch_size', default=16, type=int, help='train batch_size')
parser.add_argument('--epoch', default=1, type=int, help='train epoch')
parser.add_argument('--use_backbone', default='True', type=str, help='use backbone pretrain')
parser.add_argument('--is_eval_in_train', default=False, type=bool, help='Do eval on the val dataset atfter every train epoch')
parser.add_argument('--use_mixed', default='True',type=str, help='')
parser.add_argument('--amp_level', default='O1', help='mixed_precision level,eg:O0,O1,O2,O3')
parser.add_argument('--warmup_epoch', default=10, type=int, help='lr warm up epoch')
parser.add_argument('--decay1', default=50, type=int, help='lr first decay epoch')
parser.add_argument('--decay2', default=80, type=int, help='lr second decay epoch')
parser.add_argument('--use_cosine_decay', default='True', type=str, help='use cosine_decay_learning_rate')
parser.add_argument('--optimizer', default='sgd', help='sgd or adam')
parser.add_argument('--eval', default='False', type=str,help='')
parser.add_argument('--init_method', help='')#modelarts运行需要接收该参数,代码中未使用
args, unknown = parser.parse_known_args()
if args.eval=='True' and ((args.load_weight is None) or (args.load_weight=='None') or (args.load_weight=='')):
raise Exception('when "eval" set to True,"load_weight" must set the weigth path')
try:
from moxing.framework import file
if args.train_url is not None:
save_folder = args.train_url
except:
save_folder = args.train_url
print('Is not ModelArts platform')
if args.eval=='True':
args.use_mixed=False
if args.use_mixed=='True':
mixed_precision = True
try: # Mixed precision training https://github.com/NVIDIA/apex
from apex import amp
except:
print("install apex")
os.system('pip --default-timeout=100 install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" '+ './apex-master')
try:
from apex import amp
except:
print('Apex recommended for faster mixed precision training: https://github.com/NVIDIA/apex')
mixed_precision = False # not installed
else:
mixed_precision = False
if not os.path.exists(save_folder):
os.makedirs(save_folder)
cfg = None
if args.network == "mobile0.25":
cfg = cfg_mnet
elif args.network == "resnet50":
cfg = cfg_re50
elif args.network == "resnet152":
cfg = cfg_re152
if args.eval=='False':
if args.img_size is not None:
cfg['image_size']=args.img_size
if args.gpu_train is not None:
cfg['gpu_train']=args.gpu_train
if args.num_gpu is not None:
cfg['ngpu']=args.num_gpu
if args.batch_size is not None:
cfg['batch_size']=args.batch_size
if args.epoch is not None:
cfg['epoch']=args.epoch
if args.decay1 is not None:
cfg['decay1']=args.decay1
if args.decay2 is not None:
cfg['decay2']=args.decay2
rgb_mean = (104, 117, 123) # bgr order
num_classes = 2
img_dim = cfg['image_size']
num_gpu = cfg['ngpu']
batch_size = cfg['batch_size']
max_epoch = cfg['epoch']
gpu_train = cfg['gpu_train']
num_workers = args.num_workers
momentum = args.momentum
weight_decay = args.weight_decay
initial_lr = args.lr
gamma = args.gamma
training_dataset = os.path.join(args.data_url,'label.txt')
INFO:root:Using MoXing-v2.0.0.rc0-19e4d3ab
INFO:root:Using OBS-Python-SDK-3.20.9.1
install apex
2.3创建模型
device = torch.device("cuda:0" if args.gpu_train else "cpu")
backbone_pretrain=None
if args.use_backbone=='True':
if args.network=='mobile0.25':
backbone_pretrain='./backbone_pretrain/mobilenetV1X0.25_pretrain.tar'
if args.network=='resnet50':
backbone_pretrain= './backbone_pretrain/resnet50-19c8e357.pth'
if args.network=='resnet152':
backbone_pretrain='./backbone_pretrain/resnet152-b121ed2d.pth'
net = RetinaFace(cfg=cfg,backbone_pretrain=backbone_pretrain).to(device)
print("Printing net...")
if args.load_weight is not None:
print('Loading resume network...')
state_dict = torch.load(args.load_weight)
# create new OrderedDict that does not contain `module.`
from collections import OrderedDict
new_state_dict = OrderedDict()
for k, v in state_dict.items():
head = k[:7]
if head == 'module.':
name = k[7:] # remove `module.`
else:
name = k
new_state_dict[name] = v
net.load_state_dict(new_state_dict)
cudnn.benchmark = True
2.4激活函数、优化算法、训练函数
if args.eval=='False':
if args.optimizer=='sgd':
optimizer = optim.SGD(net.parameters(), lr=initial_lr, momentum=momentum, weight_decay=weight_decay)
elif args.optimizer=='adam':
optimizer = optim.Adam(net.parameters(), lr=initial_lr, weight_decay=weight_decay)
criterion = MultiBoxLoss(num_classes, 0.35, True, 0, True, 7, 0.35, False)
# Mixed precision training https://github.com/NVIDIA/apex
if mixed_precision:
net, optimizer = amp.initialize(net, optimizer, opt_level=args.amp_level, verbosity=0)
if num_gpu > 1 and gpu_train:
net = torch.nn.DataParallel(net)
priorbox = PriorBox(cfg, image_size=(img_dim, img_dim))
with torch.no_grad():
priors = priorbox.forward()
priors = priors.to(device)
def train():
net.train()
epoch = 0 + args.resume_epoch
print('Loading Dataset...')
dataset = WiderFaceDetection( training_dataset,preproc(img_dim, rgb_mean))
epoch_size = math.ceil(len(dataset) / batch_size)
max_iter = max_epoch * epoch_size
stepvalues = (cfg['decay1'] * epoch_size, cfg['decay2'] * epoch_size)
step_index = 0
if args.resume_epoch > 0:
start_iter = args.resume_epoch * epoch_size
else:
start_iter = 0
best_ap50=0
for iteration in range(start_iter, max_iter):
if iteration % epoch_size == 0:
# create batch iterator
batch_iterator = iter(data.DataLoader(dataset, batch_size, shuffle=True, num_workers=num_workers, collate_fn=detection_collate))
epoch += 1
load_t0 = time.time()
if iteration in stepvalues:
step_index += 1
if args.use_cosine_decay=='True':
lr=cosine_decay_learning_rate(optimizer,iteration,max_iter,epoch,epoch_size)
else:
lr = adjust_learning_rate(optimizer, gamma, epoch, step_index, iteration, epoch_size)
# load train data
images, targets = next(batch_iterator)
images = images.to(device)
targets = [anno.to(device) for anno in targets]
# forward
out = net(images)
# backprop
optimizer.zero_grad()
loss_l, loss_c, loss_landm = criterion(out, priors, targets)
loss = cfg['loc_weight'] * loss_l + loss_c + loss_landm
# Backward
if mixed_precision:
with amp.scale_loss(loss, optimizer) as scaled_loss:
scaled_loss.backward()
else:
loss.backward()
optimizer.step()
load_t1 = time.time()
batch_time = load_t1 - load_t0
eta = int(batch_time * (max_iter - iteration))
curr_epoch_iteration=iteration % epoch_size
if curr_epoch_iteration%10==0:
print('Epoch:{}/{} || Epochiter: {}/{} || Iter: {}/{} || Loc: {:.4f} Cla: {:.4f} Landm: {:.4f} || LR: {:.18f} || Batchtime: {:.4f} s || ETA: {}'
.format(epoch, max_epoch, curr_epoch_iteration + 1,
epoch_size, iteration + 1, max_iter, loss_l.item(), loss_c.item(), loss_landm.item(), lr, batch_time, str(datetime.timedelta(seconds=eta))))
model_name='RetinaFace_'+cfg['name'] + '_Final.pth'
save_model_path=os.path.join(save_folder,model_name)
torch.save(net.state_dict(), save_model_path)
def cosine_decay_learning_rate(optimizer,iteration,max_iter,epoch,epoch_size):
warmup_epoch = args.warmup_epoch
if epoch <= warmup_epoch:
lr = 1e-6 + (initial_lr-1e-6) * iteration / (epoch_size * warmup_epoch)
else:
if warmup_epoch>0:
max_iter=max_iter-warmup_epoch*epoch_size
iteration=iteration-warmup_epoch*epoch_size
lf = lambda x: (((1 + math.cos(x * math.pi / max_iter)) / 2) ** 1.0)
lr = initial_lr*lf(iteration)
for param_group in optimizer.param_groups:
param_group['lr'] = lr
return lr
def adjust_learning_rate(optimizer, gamma, epoch, step_index, iteration, epoch_size):
"""Sets the learning rate
# Adapted from PyTorch Imagenet example:
# https://github.com/pytorch/examples/blob/master/imagenet/main.py
"""
warmup_epoch = args.warmup_epoch
if epoch <= warmup_epoch:
lr = 1e-6 + (initial_lr-1e-6) * iteration / (epoch_size * warmup_epoch)
else:
lr = initial_lr * (gamma ** (step_index))
for param_group in optimizer.param_groups:
param_group['lr'] = lr
return lr
2.5开始训练
由于数据量大,一次训练需要几分钟
if __name__ == '__main__':
train()
Loading Dataset...
Epoch:1/1 || Epochiter: 1/805 || Iter: 1/805 || Loc: 0.2966 Cla: 0.7142 Landm: 0.6346 || LR: 0.000001000000000000 || Batchtime: 18.7019 s || ETA: 4:10:55
Epoch:1/1 || Epochiter: 11/805 || Iter: 11/805 || Loc: 0.5082 Cla: 0.9243 Landm: 2.6661 || LR: 0.000002240993788820 || Batchtime: 0.8892 s || ETA: 0:11:46
.
.
.
Epoch:1/1 || Epochiter: 791/805 || Iter: 791/805 || Loc: 0.4356 Cla: 0.7774 Landm: 0.6903 || LR: 0.000099038509316770 || Batchtime: 0.8913 s || ETA: 0:00:13
Epoch:1/1 || Epochiter: 801/805 || Iter: 801/805 || Loc: 0.7497 Cla: 0.9509 Landm: 0.6751 || LR: 0.000100279503105590 || Batchtime: 0.9403 s || ETA: 0:00:04
3.模型测试
3.1测试函数¶
# -*- coding: utf-8 -*-
import numpy as np
from PIL import Image,ImageDraw
import os
import torch
import torch.backends.cudnn as cudnn
import cv2
import time
from models.retinaface import RetinaFace
from utils.box_utils import decode, decode_landm
from utils.timer import Timer
from data import cfg_mnet, cfg_re50,cfg_re152
from layers.functions.prior_box import PriorBox
from utils.nms.py_cpu_nms import py_cpu_nms
class ObjectDetect():
def __init__(self, model_path):
torch.set_grad_enabled(False)
self.cfg=cfg_re50
if torch.cuda.is_available():
use_cpu=False
else:
use_cpu=True
self.device=torch.device("cpu" if use_cpu else "cuda")
self.target_size=1600
self.max_size=2150
self.origin_size=False
self.confidence_threshold=0.02
self.nms_threshold=0.4
self.net = RetinaFace(cfg=self.cfg, phase = 'test')
self.net = load_model(self.net, model_path, use_cpu)
self.net.eval()
print('load smodel success')
cudnn.benchmark = True
self.net = self.net.to(self.device)
def predict(self, file_name):
image = Image.open(file_name).convert('RGB')
img_rgb=np.array(image)
img_bgr = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2BGR) # PIL读取的RGB图像转换为CV2的BGR格式
img = np.float32(img_bgr)
im_shape = img.shape
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
resize = float(self.target_size) / float(im_size_min)
if np.round(resize * im_size_max) >self. max_size:
resize = float(self.max_size) / float(im_size_max)
if self.origin_size:
resize = 1
if resize != 1:
img = cv2.resize(img, None, None, fx=resize, fy=resize, interpolation=cv2.INTER_CUBIC)
im_height, im_width, _ = img.shape
scale = torch.Tensor([img.shape[1], img.shape[0], img.shape[1], img.shape[0]])
img -= (104, 117, 123)
img = img.transpose(2, 0, 1)#channel last转换为channel first
img = torch.from_numpy(img).unsqueeze(0)
img = img.to(self.device)
scale = scale.to(self.device)
loc, conf, landms = self.net(img) # forward pass
priorbox = PriorBox(self.cfg, image_size=(im_height, im_width))
priors = priorbox.forward()
priors = priors.to(self.device)
prior_data = priors.data
boxes = decode(loc.data.squeeze(0), prior_data, self.cfg['variance'])
boxes = boxes * scale / resize
boxes = boxes.cpu().numpy()
scores = conf.squeeze(0).data.cpu().numpy()[:, 1]
landms = decode_landm(landms.data.squeeze(0), prior_data, self.cfg['variance'])
scale1 = torch.Tensor([img.shape[3], img.shape[2], img.shape[3], img.shape[2],
img.shape[3], img.shape[2], img.shape[3], img.shape[2],
img.shape[3], img.shape[2]])
scale1 = scale1.to(self.device)
landms = landms * scale1 / resize
landms = landms.cpu().numpy()
# ignore low scores
inds = np.where(scores > self.confidence_threshold)[0]
boxes = boxes[inds]
landms = landms[inds]
scores = scores[inds]
# keep top-K before NMS
order = scores.argsort()[::-1]
# order = scores.argsort()[::-1][:args.top_k]
boxes = boxes[order]
landms = landms[order]
scores = scores[order]
# do NMS
dets = np.hstack((boxes, scores[:, np.newaxis])).astype(np.float32, copy=False)
keep = py_cpu_nms(dets, self.nms_threshold)
# keep = nms(dets, args.nms_threshold,force_cpu=args.cpu)
dets = dets[keep, :]
landms = landms[keep]
# keep top-K faster NMS
# dets = dets[:args.keep_top_k, :]
# landms = landms[:args.keep_top_k, :]
dets = np.concatenate((dets, landms), axis=1)
bboxs = dets
for box in bboxs:
if box[4] > 0.7:
image = cv2.cvtColor(np.asarray(image),cv2.COLOR_RGB2BGR)
print(int(box[0]),int(box[1]),int(box[2]),int(box[3]))
confidence = str(box[4])
image = cv2.rectangle(image,(int(box[0]),int(box[1])),(int(box[2]),int(box[3])),(0,255,0),2)
return image
def check_keys(model, pretrained_state_dict):
ckpt_keys = set(pretrained_state_dict.keys())
model_keys = set(model.state_dict().keys())
used_pretrained_keys = model_keys & ckpt_keys
unused_pretrained_keys = ckpt_keys - model_keys
missing_keys = model_keys - ckpt_keys
print('Missing keys:{}'.format(len(missing_keys)))
print('Unused checkpoint keys:{}'.format(len(unused_pretrained_keys)))
print('Used keys:{}'.format(len(used_pretrained_keys)))
assert len(used_pretrained_keys) > 0, 'load NONE from pretrained checkpoint'
return True
def remove_prefix(state_dict, prefix):
''' Old style model is stored with all names of parameters sharing common prefix 'module.' '''
print('remove prefix \'{}\''.format(prefix))
f = lambda x: x.split(prefix, 1)[-1] if x.startswith(prefix) else x
return {f(key): value for key, value in state_dict.items()}
def load_model(model, pretrained_path, load_to_cpu):
print('Loading pretrained model from {}'.format(pretrained_path))
if load_to_cpu:
pretrained_dict = torch.load(pretrained_path, map_location=lambda storage, loc: storage)
else:
device = torch.cuda.current_device()
pretrained_dict = torch.load(pretrained_path, map_location=lambda storage, loc: storage.cuda(device))
if "state_dict" in pretrained_dict.keys():
pretrained_dict = remove_prefix(pretrained_dict['state_dict'], 'module.')
else:
pretrained_dict = remove_prefix(pretrained_dict, 'module.')
check_keys(model, pretrained_dict)
model.load_state_dict(pretrained_dict, strict=False)
return model
3.2开始预测
测试图像可以自行修改
import matplotlib.pyplot as plt
Retinaface =ObjectDetect('./output/RetinaFace_Resnet50_Final.pth')
filename = './WIDER_train/images/16--Award_Ceremony/16_Award_Ceremony_Awards_Ceremony_16_110.jpg'
result = Retinaface.predict(filename)
result = Image.fromarray(cv2.cvtColor(result,cv2.COLOR_BGR2RGB))
plt.figure(figsize=(10,10)) #设置窗口大小
plt.imshow(result)
plt.show()
【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱:
cloudbbs@huaweicloud.com
- 点赞
- 收藏
- 关注作者
评论(0)