ModelArts使用系列文章-(2)口罩模型训练作业
AI平台的设计初衷一定包含一条:提高开发效率、加快算法迭代周期。今天的文章我会使用ModelArts平台训练一个模型。疫情期间,对于民众来说,佩戴口罩是最有效防止被传染新冠病毒的方式,保护自己的同时也保护他人。所以今天的主题就是-佩戴口罩的识别模型训练。
基本思路
识别算法离不开目标检测。目标检测(Object Detection)的任务是找出图像中所有感兴趣的目标(物体),确定它们的位置和大小。由于各类物体有不同的形状、大小和数量,加上物体间还会相互遮挡, 因此目标检测一直都是机器视觉领域中最具挑战性的难题之一。主流算法中表现最好的是SSD(Single Shot MultiBox Detector,“Single Shot”指的是单目标检测,“MultiBox”中的“Box”就像是我们平时拍摄时用到的取景框,只关注框内的画面,屏蔽框外的内容。创建“Multi”个"Box",将每个"Box"的单目标检测结果汇总起来就是多目标检测。换句话说,SSD将图像切分为N片,并对每片进行独立的单目标检测,最后汇总每片的检测结果。)和YOLO,我训练口罩识别模型就采用了这两种方式,不管是用SSD还是YOLO,目标检测过程都可以分解为两个独立的操作:
l 定位(location): 用一个矩形(bounding box)来框定物体,bounding box一般由4个整数组成,分别表示矩形左上角和右下角的x和y坐标,或矩形的左上角坐标以及矩形的长和高。
l 分类(classification): 识别bounding box中的(最大的)物体。
首先需要数据集,口罩图片数据集在这里:https://modelarts-labs-bj4.obs.cn-north-4.myhuaweicloud.com/case_zoo/mask_detect/datasets/mask_detect_datasets.zip。
对于数据集的数据类型说明:
l 数据集为Pascal VOC的数据集模式,图片均为jpg图片,标注均为xml文件。标注文 件组织均严格依照Pascal的文件组织。 数据集分为训练集和测试集,其中训练集有291张,测试集有16张。
l 训练集的数据有标注,测试集的没有。在训练过程中需要将训练集分割为训练集和验证集进行训练。
我的思路是不光在ModelArts平台上训练模型,我也要在开源基础上训练,对比两种方式的优缺点、模型准确度,具体计划如下:
1. 基于一个开源项目,修改加载数据、数据转换、训练参数调整等代码,然后借助华为云的计算能力训练模型,模型生成后使用测试数据集测试;
2. 使用ModelArts的开发套件再做一次模型训练和模型评估;
3. 对比两个模型的效果,并对开发过程作对比分析。
基于开源项目
方案一
采用VGG网络(该方案最终由于网络需要消耗的内存过大,放弃)
1. 首先需要加载VOC格式的数据并转换成pytorch的tensor。
代码如下所示:
from PIL import Image
import numpy as np
import os
from torch.utils.data import Dataset
import math
import cv2
import torchvision
import os.path as osp
import torch
import torch.utils.data as data
import xml.etree.ElementTree as ET
import torchvision.transforms as transforms
from utils.data_aug import ColorAugmentation
img_path = "./data/train_img"
label_path = './data/train_label'
img_list = [os.path.join(img_path, x) for x in img_path]
percent = 0.7
sep_num = int(len(img_list)*percent)
train_list = img_list[:sep_num]
test_list = img_list[sep_num:]
MASK_CLASSES = ('no_mask', 'yes_mask')
class VOCAnnotationTransform(object):
"""Transforms a VOC annotation into a Tensor of bbox coords and label index
Initilized with a dictionary lookup of classnames to indexes
Arguments:
class_to_ind (dict, optional): dictionary lookup of classnames -> indexes
(default: alphabetic indexing of VOC's 20 classes)
keep_difficult (bool, optional): keep difficult instances or not
(default: False)
height (int): height
width (int): width
"""
def __init__(self, class_to_ind=None, keep_difficult=False):
self.class_to_ind = class_to_ind or dict(
zip(MASK_CLASSES, range(len(MASK_CLASSES))))
self.keep_difficult = keep_difficult
def __call__(self, target, width, height):
"""
Arguments:
target (annotation) : the target annotation to be made usable
will be an ET.Element
Returns:
a list containing lists of bounding boxes [bbox coords, class name]
"""
res = []
for obj in target.iter('object'):
difficult = int(obj.find('difficult').text) == 1
if not self.keep_difficult and difficult:
continue
name = obj.find('name').text.lower().strip()
bbox = obj.find('bndbox')
pts = ['xmin', 'ymin', 'xmax', 'ymax']
bndbox = []
for i, pt in enumerate(pts):
cur_pt = int(bbox.find(pt).text) - 1
# scale height or width
cur_pt = cur_pt / width if i % 2 == 0 else cur_pt / height
bndbox.append(cur_pt)
label_idx = self.class_to_ind[name]
bndbox.append(label_idx)
res += [bndbox] # [xmin, ymin, xmax, ymax, label_ind]
# img_id = target.find('filename').text[:-4]
return res # [[xmin, ymin, xmax, ymax, label_ind], ... ]
class VOCDetection(data.Dataset):
"""VOC Detection Dataset Object
input is image, target is annotation
Arguments:
root (string): filepath to VOCdevkit folder.
image_set (string): imageset to use (eg. 'train', 'val', 'test')
transform (callable, optional): transformation to perform on the
input image
target_transform (callable, optional): transformation to perform on the
target `annotation`
(eg: take in caption string, return tensor of word indices)
dataset_name (string, optional): which dataset to load
(default: 'VOC2007')
"""
def __init__(self, root,
# image_sets=[('2007', 'trainval'). ('2012', 'trainval')],
transform=None, target_transform=VOCAnnotationTransform(),
dataset_name='ModelArtsMask'):
self.root = root
# self.image_set = image_sets
self.transform = transform
self.target_transform = target_transform
self.name = dataset_name
# self._annopath = osp.join('%s', 'Annotations', '%s.xml')
# self._imgpath = osp.join('%s', 'JPEGImages', '%s.jpg')
self._annopath = label_path
self._imgpath = img_path
self.ids = img_list
# for (year, name) in image_sets:
# rootpath = osp.join(self.root, 'VOC' + year)
# for line in open(osp.join(rootpath, 'ImageSets', 'Main', name + '.txt')):
# self.ids.append((rootpath, line.strip()))
def __getitem__(self, index):
im, gt, h, w = self.pull_item(index)
return im, gt
def __len__(self):
return len(self.ids)
def pull_item(self, index):
img_id = self.ids[index]
target = ET.parse(self._annopath % img_id).getroot()
img = cv2.imread(self._imgpath % img_id)
height, width, channels = img.shape
if self.target_transform is not None:
target = self.target_transform(target, width, height)
if self.transform is not None:
target = np.array(target)
img, boxes, labels = self.transform(img, target[:, :4], target[:, 4])
# to rgb
img = img[:, :, (2, 1, 0)]
# img = img.transpose(2, 0, 1)
target = np.hstack((boxes, np.expand_dims(labels, axis=1)))
return torch.from_numpy(img).permute(2, 0, 1), target, height, width
# return torch.from_numpy(img), target, height, width
def pull_image(self, index):
'''Returns the original image object at index in PIL form
Note: not using self.__getitem__(), as any transformations passed in
could mess up this functionality.
Argument:
index (int): index of img to show
Return:
PIL img
'''
img_id = self.ids[index]
return cv2.imread(self._imgpath % img_id, cv2.IMREAD_COLOR)
def pull_anno(self, index):
'''Returns the original annotation of image at index
Note: not using self.__getitem__(), as any transformations passed in
could mess up this functionality.
Argument:
index (int): index of img to get annotation of
Return:
list: [img_id, [(label, bbox coords),...]]
eg: ('001718', [('dog', (96, 13, 438, 332))])
'''
img_id = self.ids[index]
anno = ET.parse(self._annopath % img_id).getroot()
gt = self.target_transform(anno, 1, 1)
return img_id[1], gt
def pull_tensor(self, index):
'''Returns the original image at an index in tensor form
Note: not using self.__getitem__(), as any transformations passed in
could mess up this functionality.
Argument:
index (int): index of img to show
Return:
tensorized version of img, squeezed
'''
return torch.Tensor(self.pull_image(index)).unsqueeze_(0)
#main函数入口
if __name__ == '__main__':
img_size = 224
#transforms.Normalize 转为tensor,并归一化至[0-1]
normalize = transforms.Normalize(mean=[0.14300402, 0.1434545, 0.14277956],
##accorcoding to casia-surf val to commpute
std=[0.10050353, 0.100842826, 0.10034215])
train_dataset = VOCDetection(
#通常,在使用torchvision.transforms,我们通常使用transforms.Compose将transforms组合在一起。
transforms.Compose([
transforms.RandomResizedCrop(img_size),#随机长宽比裁剪
transforms.RandomHorizontalFlip(),#依概率p水平翻转
transforms.ToTensor(),#转为tensor
ColorAugmentation(),
normalize,
])
)
2. 模型训练
我采用的是pyTorch框架,所以选了这个开源项目,https://github.com/amdegroot/ssd.pytorch,由于已经给弃用方案一,就不做解释了。
方案二
我选择采用keras-yolo3-Mobilenet方案,开源项目地址:
https://github.com/Adamdad/keras-YOLOv3-mobilenet。
MobileNet的创新亮点是Depthwise Separable Convolution(深度可分离卷积),与VGG16相比,在很小的精度损失情况下,将运算量减小了30倍。YOLOv3的创新亮点是DarkNet-53、Prediction Across Scales、多标签多分类的逻辑回归层。
基于开源数据集的实验结果:
训练模型脚本:
import numpy as np
import keras.backend as K
from keras.layers import Input, Lambda
from keras.models import Model
from keras.optimizers import Adam
from keras.callbacks import TensorBoard, ModelCheckpoint, ReduceLROnPlateau, EarlyStopping
import os
from yolo3.model_Mobilenet import preprocess_true_boxes, yolo_body, tiny_yolo_body, yolo_loss
from yolo3.utils import get_random_data
import argparse
from voc_annotation import mask_convert
def _main():
parser = argparse.ArgumentParser(description="training a maskmodel in modelarts")
parser.add_argument("--train_url", default='logs/maskMobilenet/002_Mobilenet_finetune/', type=str)
parser.add_argument("--data_url", default="D:/code/mask_detection/data/MASK_MERGE/", type=str)
parser.add_argument("--num_gpus", default=0, type=int)
args = parser.parse_args()
num_classes = 2
anchors = get_anchors()
# print(anchors)
# print(type(anchors))
os.environ["CUDA_VISIBLE_DEVICES"] = str(args.num_gpus)
classes = ["no_mask", "yes_mask"]
input_shape = (320,320) # multiple of 32, hw
is_tiny_version = len(anchors)==6 # default setting
if is_tiny_version:
print("tiny")
model = create_tiny_model(input_shape, anchors, num_classes,
freeze_body=2)
else:
model = create_model(input_shape, anchors, num_classes,load_pretrained=False,
weights_path=args.data_url+'/trained_weights_final.h5',
freeze_body=2) # make sure you know what you freeze
logging = TensorBoard(log_dir=args.train_url)
checkpoint = ModelCheckpoint(args.train_url + 'ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5',
monitor='val_loss', save_weights_only=True, save_best_only=True, period=3)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3, verbose=1)
early_stopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1)
# train_path = os.path.join(args.train_url, "2020_mask.txt")
# with open(train_path) as t_f:
# t_lines = t_f.readlines()
t_lines = mask_convert(args.data_url, classes)
np.random.seed(10101)
np.random.shuffle(t_lines)
np.random.seed(None)
sep_num = int(0.8*len(t_lines))
v_lines = t_lines[sep_num:]
t_lines = t_lines[:sep_num]
num_train = len(t_lines)
# with open(val_path) as v_f:
# v_lines = v_f.readlines()
np.random.seed(10010)
np.random.shuffle(v_lines)
np.random.seed(None)
num_val = len(v_lines)
if __name__ == '__main__':
_main()
本地训练生成的模型,使用脚本yolo_Mobilenet.py预测图片:
import colorsys
import os
from timeit import default_timer as timer
import tensorflow as tf
import numpy as np
from keras import backend as K
from keras.models import load_model
from keras.layers import Input
from PIL import Image, ImageFont, ImageDraw
from yolo3.model_Mobilenet import yolo_eval, yolo_body, tiny_yolo_body
from yolo3.utils import letterbox_image
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from keras.utils import multi_gpu_model
gpu_num=1
class YOLO(object):
def __init__(self):
self.model_path = 'logs/maskMobilenet/001_Mobilenet_finetune/trained_weights_final.h5' # model path or trained weights path
self.anchors_path = 'model_data/yolo_anchors.txt'
self.classes_path = 'model_data/mask_classes.txt'
self.score = 0.3
self.iou = 0.45
self.class_names = self._get_class()
self.anchors = self._get_anchors()
self.sess = K.get_session()
self.model_image_size = (320, 320) # fixed size or (None, None), hw
self.boxes, self.scores, self.classes = self.generate()
def _get_class(self):
classes_path = os.path.expanduser(self.classes_path)
with open(classes_path) as f:
class_names = f.readlines()
class_names = [c.strip() for c in class_names]
return class_names
def _get_anchors(self):
anchors_path = os.path.expanduser(self.anchors_path)
with open(anchors_path) as f:
anchors = f.readline()
anchors = [float(x) for x in anchors.split(',')]
return np.array(anchors).reshape(-1, 2)
def generate(self):
'''to generate the bounding boxes'''
model_path = os.path.expanduser(self.model_path)
assert model_path.endswith('.h5'), 'Keras model or weights must be a .h5 file.'
# Load model, or construct model and load weights.
num_anchors = len(self.anchors)
num_classes = len(self.class_names)
is_tiny_version = num_anchors==6 # default setting
try:
self.yolo_model = load_model(model_path, compile=False)
except:
self.yolo_model = tiny_yolo_body(Input(shape=(None,None,3)), num_anchors//2, num_classes) \
if is_tiny_version else yolo_body(Input(shape=(None,None,3)), num_anchors//3, num_classes)
self.yolo_model.load_weights(self.model_path) # make sure model, anchors and classes match
else:
assert self.yolo_model.layers[-1].output_shape[-1] == \
num_anchors/len(self.yolo_model.output) * (num_classes + 5), \
'Mismatch between model and given anchor and class sizes'
print('{} model, anchors, and classes loaded.'.format(model_path))
# Generate colors for drawing bounding boxes.
# hsv_tuples = [(x / len(self.class_names), 1., 1.)
# for x in range(len(self.class_names))]
# self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
# self.colors = list(
# map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)),
# self.colors))
# np.random.seed(10101) # Fixed seed for consistent colors across runs.
# np.random.shuffle(self.colors) # Shuffle colors to decorrelate adjacent classes.
# np.random.seed(None) # Reset seed to default.
# Generate output tensor targets for filtered bounding boxes.
self.input_image_shape = K.placeholder(shape=(2, ))
if gpu_num>=2:
self.yolo_model = multi_gpu_model(self.yolo_model, gpus=gpu_num)
boxes, scores, classes = yolo_eval(self.yolo_model.output, self.anchors,
len(self.class_names), self.input_image_shape,
score_threshold=self.score, iou_threshold=self.iou)
# default arg
# self.yolo_model->'model_data/yolo.h5'
# self.anchors->'model_data/yolo_anchors.txt'-> 9 scales for anchors
return boxes, scores, classes
def mask_detect(yolo,mainFolder = 'D:/Code/mask_detection/data/test'):
import json
fold_list = range(1, 15)
for i in fold_list:
foldname = mainFolder
list = os.listdir(foldname) # 列出文件夹下所有的目录与文件
json_all = {}
json_f = open('annotation_YOLOv3.json', 'w')
for i in range(0, len(list)):
name, ext = os.path.splitext(list[i])
if ext == '.jpg':
print(list[i])
json_pic = {}
annotation = []
image = Image.open(foldname + '/' + list[i])
rects = yolo.detect_image(image)
for rect in rects:
score, x1, y1, x2, y2 = float(rect['score']), int(float(rect['x1'])), int(float(rect['y1'])), int(
float(rect['x2'])), int(float(rect['y2']))
if float(rect['score'])>0.5:
label = "yes_mask"
else:
label = "no_mask"
bbox = {"category": "ModelArts",
"id": 0,
"shape": ["Box", 1],
"label": label,
"x": x1,
"y": y1,
"width": x2 - x1,
"height": y2 - y1,
"score": score}
annotation.append(bbox)
json_pic["annotations"] = annotation
json_pic["height"] = 480
json_pic["name"] = list[i]
json_pic["width"] = 640
json_all[list[i]] = json_pic
json_f.write(json.dumps(json_all, indent=4))
json_f.close()
yolo.close_session()
if __name__ == '__main__':
mask_detect(YOLO())
#detect_test(YOLO(), json_name='../mrsub/mrsub_test.json',test_out_json='mobilenet_train_bw_test_mrsub.json', data_dst='../mrsub/')
# detect_test_draw(YOLO(), json_name='D:/Code/mask_detection/keras-YOLOv3-mobilenet-master/annotation_YOLOv3.json',test_pic='D:/Code/mask_detection/data/test')
预测结果放在了annotation_YOLOv3.json,内容如下所示:
{
"no_1.jpg": {
"annotations": [
{
"category": "ModelArts",
"id": 0,
"shape": [
"Box",
1
],
"label": "no_mask",
"x": 278,
"y": 82,
"width": 35,
"height": 62,
"score": 0.48977488
}
],
"height": 480,
"name": "no_1.jpg",
"width": 640
}
}
为了便于直观,我把结果打印在了图片上,如下所示:
模型精度不是很高,有没有识别出来的。
方案三
我使用ModelArts训练平台,用的是OBS桶上传已经调试好的代码(建议大家体验ModelArts的Notebook训练模型方式),如下图所示:
接着启动Notebook,不过我没有用jupyter方式写代码,而是采用同步OBS桶的资源,通过Notebook启动一个GPU镜像:
运行以下代码,启动训练任务:
训练过程输出片段如下所示:
2020-04-07 18:58:14.497319: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
7/7 [==============================] - 17s 2s/step - loss: 4226.4421 - val_loss: 22123.3750
Epoch 2/50
7/7 [==============================] - 6s 855ms/step - loss: 1083.1558 - val_loss: 1734.1427
Epoch 3/50
7/7 [==============================] - 6s 864ms/step - loss: 521.8567 - val_loss: 455.0971
Epoch 4/50
7/7 [==============================] - 6s 851ms/step - loss: 322.8907 - val_loss: 193.3107
Epoch 5/50
7/7 [==============================] - 6s 841ms/step - loss: 227.7257 - val_loss: 150.8902
Epoch 6/50
7/7 [==============================] - 6s 851ms/step - loss: 179.0605 - val_loss: 154.9351
Epoch 7/50
7/7 [==============================] - 6s 868ms/step - loss: 150.4297 - val_loss: 147.3101
Epoch 8/50
7/7 [==============================] - 8s 1s/step - loss: 129.5681 - val_loss: 144.8283
模型生成后,创建一个python脚本,代码如下,实现了模型文件拷贝到OBS桶:
from modelarts.session import Session
session = Session()
session.upload_data(bucket_path="/mask-detection-modelarts-test/run/log/", path="/home/ma-user/work/log/trained_weights_final.h5")
测试结果抽选了一张如下图所示:
- 点赞
- 收藏
- 关注作者
评论(0)