ModelArts使用系列文章-(2)口罩模型训练作业

举报
麦克周 发表于 2020/03/31 18:40:13 2020/03/31
【摘要】 AI平台的设计初衷一定包含一条:提高开发效率、加快算法迭代周期。今天的文章我会使用ModelArts平台训练一个模型。疫情期间,对于民众来说,佩戴口罩是最有效防止被传染新冠病毒的方式,保护自己的同时也保护他人。所以今天的主题就是-佩戴口罩的识别模型训练。

AI平台的设计初衷一定包含一条:提高开发效率、加快算法迭代周期。今天的文章我会使用ModelArts平台训练一个模型。疫情期间,对于民众来说,佩戴口罩是最有效防止被传染新冠病毒的方式,保护自己的同时也保护他人。所以今天的主题就是-佩戴口罩的识别模型训练。

基本思路

识别算法离不开目标检测。目标检测(Object Detection)的任务是找出图像中所有感兴趣的目标(物体),确定它们的位置和大小。由于各类物体有不同的形状、大小和数量,加上物体间还会相互遮挡, 因此目标检测一直都是机器视觉领域中最具挑战性的难题之一。主流算法中表现最好的是SSDSingle Shot MultiBox Detector“Single Shot”指的是单目标检测,“MultiBox”中的“Box”就像是我们平时拍摄时用到的取景框,只关注框内的画面,屏蔽框外的内容。创建“Multi”"Box",将每个"Box"的单目标检测结果汇总起来就是多目标检测。换句话说,SSD将图像切分为N片,并对每片进行独立的单目标检测,最后汇总每片的检测结果。)和YOLO,我训练口罩识别模型就采用了这两种方式,不管是用SSD还是YOLO,目标检测过程都可以分解为两个独立的操作:

l  定位(location): 用一个矩形(bounding box)来框定物体,bounding box一般由4个整数组成,分别表示矩形左上角和右下角的xy坐标,或矩形的左上角坐标以及矩形的长和高。

l  分类(classification): 识别bounding box中的(最大的)物体。

 

首先需要数据集,口罩图片数据集在这里:https://modelarts-labs-bj4.obs.cn-north-4.myhuaweicloud.com/case_zoo/mask_detect/datasets/mask_detect_datasets.zip

对于数据集的数据类型说明:

l  数据集为Pascal VOC的数据集模式,图片均为jpg图片,标注均为xml文件。标注文 件组织均严格依照Pascal的文件组织。 数据集分为训练集和测试集,其中训练集有291张,测试集有16张。

l  训练集的数据有标注,测试集的没有。在训练过程中需要将训练集分割为训练集和验证集进行训练。

我的思路是不光在ModelArts平台上训练模型,我也要在开源基础上训练,对比两种方式的优缺点、模型准确度,具体计划如下:

1. 基于一个开源项目,修改加载数据、数据转换、训练参数调整等代码,然后借助华为云的计算能力训练模型,模型生成后使用测试数据集测试;

2. 使用ModelArts的开发套件再做一次模型训练和模型评估;

3. 对比两个模型的效果,并对开发过程作对比分析。

基于开源项目

方案一

采用VGG网络(该方案最终由于网络需要消耗的内存过大,放弃)

1.         首先需要加载VOC格式的数据并转换成pytorchtensor

代码如下所示:

from PIL import Image

import numpy as np

import os

from torch.utils.data import Dataset

import math

import cv2

import torchvision

import os.path as osp

import torch

import torch.utils.data as data

import xml.etree.ElementTree as ET

import torchvision.transforms as transforms

from utils.data_aug import ColorAugmentation

 

 

 

 

img_path = "./data/train_img"

label_path = './data/train_label'

img_list = [os.path.join(img_path, x) for x in img_path]

 

percent = 0.7

sep_num = int(len(img_list)*percent)

train_list = img_list[:sep_num]

test_list = img_list[sep_num:]

 

MASK_CLASSES = ('no_mask', 'yes_mask')

 

class VOCAnnotationTransform(object):

    """Transforms a VOC annotation into a Tensor of bbox coords and label index

    Initilized with a dictionary lookup of classnames to indexes

    Arguments:

        class_to_ind (dict, optional): dictionary lookup of classnames -> indexes

            (default: alphabetic indexing of VOC's 20 classes)

        keep_difficult (bool, optional): keep difficult instances or not

            (default: False)

        height (int): height

        width (int): width

    """

 

    def __init__(self, class_to_ind=None, keep_difficult=False):

        self.class_to_ind = class_to_ind or dict(

            zip(MASK_CLASSES, range(len(MASK_CLASSES))))

        self.keep_difficult = keep_difficult

 

    def __call__(self, target, width, height):

        """

        Arguments:

            target (annotation) : the target annotation to be made usable

                will be an ET.Element

        Returns:

            a list containing lists of bounding boxes  [bbox coords, class name]

        """

        res = []

        for obj in target.iter('object'):

            difficult = int(obj.find('difficult').text) == 1

            if not self.keep_difficult and difficult:

                continue

            name = obj.find('name').text.lower().strip()

            bbox = obj.find('bndbox')

 

            pts = ['xmin', 'ymin', 'xmax', 'ymax']

            bndbox = []

            for i, pt in enumerate(pts):

                cur_pt = int(bbox.find(pt).text) - 1

                # scale height or width

                cur_pt = cur_pt / width if i % 2 == 0 else cur_pt / height

                bndbox.append(cur_pt)

            label_idx = self.class_to_ind[name]

            bndbox.append(label_idx)

            res += [bndbox]  # [xmin, ymin, xmax, ymax, label_ind]

            # img_id = target.find('filename').text[:-4]

 

        return res  # [[xmin, ymin, xmax, ymax, label_ind], ... ]

 

class VOCDetection(data.Dataset):

    """VOC Detection Dataset Object

    input is image, target is annotation

    Arguments:

        root (string): filepath to VOCdevkit folder.

        image_set (string): imageset to use (eg. 'train', 'val', 'test')

        transform (callable, optional): transformation to perform on the

            input image

        target_transform (callable, optional): transformation to perform on the

            target `annotation`

            (eg: take in caption string, return tensor of word indices)

        dataset_name (string, optional): which dataset to load

            (default: 'VOC2007')

    """

 

    def __init__(self, root,

                 # image_sets=[('2007', 'trainval'). ('2012', 'trainval')],

                 transform=None, target_transform=VOCAnnotationTransform(),

                 dataset_name='ModelArtsMask'):

        self.root = root

        # self.image_set = image_sets

        self.transform = transform

        self.target_transform = target_transform

        self.name = dataset_name

        # self._annopath = osp.join('%s', 'Annotations', '%s.xml')

        # self._imgpath = osp.join('%s', 'JPEGImages', '%s.jpg')

        self._annopath = label_path

        self._imgpath = img_path

        self.ids = img_list

        # for (year, name) in image_sets:

        #     rootpath = osp.join(self.root, 'VOC' + year)

        #     for line in open(osp.join(rootpath, 'ImageSets', 'Main', name + '.txt')):

        #         self.ids.append((rootpath, line.strip()))

 

    def __getitem__(self, index):

        im, gt, h, w = self.pull_item(index)

 

        return im, gt

 

    def __len__(self):

        return len(self.ids)

 

    def pull_item(self, index):

        img_id = self.ids[index]

 

        target = ET.parse(self._annopath % img_id).getroot()

        img = cv2.imread(self._imgpath % img_id)

        height, width, channels = img.shape

 

        if self.target_transform is not None:

            target = self.target_transform(target, width, height)

 

        if self.transform is not None:

            target = np.array(target)

            img, boxes, labels = self.transform(img, target[:, :4], target[:, 4])

            # to rgb

            img = img[:, :, (2, 1, 0)]

            # img = img.transpose(2, 0, 1)

            target = np.hstack((boxes, np.expand_dims(labels, axis=1)))

        return torch.from_numpy(img).permute(2, 0, 1), target, height, width

        # return torch.from_numpy(img), target, height, width

 

    def pull_image(self, index):

        '''Returns the original image object at index in PIL form

        Note: not using self.__getitem__(), as any transformations passed in

        could mess up this functionality.

        Argument:

            index (int): index of img to show

        Return:

            PIL img

        '''

        img_id = self.ids[index]

        return cv2.imread(self._imgpath % img_id, cv2.IMREAD_COLOR)

 

    def pull_anno(self, index):

        '''Returns the original annotation of image at index

        Note: not using self.__getitem__(), as any transformations passed in

        could mess up this functionality.

        Argument:

            index (int): index of img to get annotation of

        Return:

            list:  [img_id, [(label, bbox coords),...]]

                eg: ('001718', [('dog', (96, 13, 438, 332))])

        '''

        img_id = self.ids[index]

        anno = ET.parse(self._annopath % img_id).getroot()

        gt = self.target_transform(anno, 1, 1)

        return img_id[1], gt

 

    def pull_tensor(self, index):

        '''Returns the original image at an index in tensor form

        Note: not using self.__getitem__(), as any transformations passed in

        could mess up this functionality.

        Argument:

            index (int): index of img to show

        Return:

            tensorized version of img, squeezed

        '''

        return torch.Tensor(self.pull_image(index)).unsqueeze_(0)

 

#main函数入口

if __name__ == '__main__':

    img_size = 224

    #transforms.Normalize 转为tensor,并归一化至[0-1]

    normalize = transforms.Normalize(mean=[0.14300402, 0.1434545, 0.14277956],

                                     ##accorcoding to casia-surf val to commpute

                                     std=[0.10050353, 0.100842826, 0.10034215])

    train_dataset = VOCDetection(

        #通常,在使用torchvision.transforms,我们通常使用transforms.Composetransforms组合在一起。

        transforms.Compose([

            transforms.RandomResizedCrop(img_size),#随机长宽比裁剪

            transforms.RandomHorizontalFlip(),#依概率p水平翻转

            transforms.ToTensor(),#转为tensor

            ColorAugmentation(),

            normalize,

        ])

    )

2.         模型训练

我采用的是pyTorch框架,所以选了这个开源项目,https://github.com/amdegroot/ssd.pytorch,由于已经给弃用方案一,就不做解释了。


方案二

我选择采用keras-yolo3-Mobilenet方案,开源项目地址:

https://github.com/Adamdad/keras-YOLOv3-mobilenet

MobileNet的创新亮点是Depthwise Separable Convolution(深度可分离卷积),与VGG16相比,在很小的精度损失情况下,将运算量减小了30倍。YOLOv3的创新亮点是DarkNet-53Prediction Across Scales、多标签多分类的逻辑回归层。

基于开源数据集的实验结果:

1585834218948484.jpg

训练模型脚本:

import numpy as np
import keras.backend as K
from keras.layers import Input, Lambda
from keras.models import Model
from keras.optimizers import Adam
from keras.callbacks import TensorBoard, ModelCheckpoint, ReduceLROnPlateau, EarlyStopping
import os
from yolo3.model_Mobilenet import preprocess_true_boxes, yolo_body, tiny_yolo_body, yolo_loss
from yolo3.utils import get_random_data
import argparse
from voc_annotation import mask_convert


def _main():
    parser = argparse.ArgumentParser(
description="training a maskmodel in modelarts")

    parser.add_argument(
"--train_url", default='logs/maskMobilenet/002_Mobilenet_finetune/', type=str)
    parser.add_argument(
"--data_url", default="D:/code/mask_detection/data/MASK_MERGE/", type=str)
    parser.add_argument(
"--num_gpus", default=0, type=int)
    args = parser.parse_args()
    num_classes =
2
   
anchors = get_anchors()
   
# print(anchors)
    # print(type(anchors))
   
os.environ["CUDA_VISIBLE_DEVICES"] = str(args.num_gpus)

    classes = [
"no_mask", "yes_mask"]

    input_shape = (
320,320) # multiple of 32, hw

   
is_tiny_version = len(anchors)==6 # default setting
   
if is_tiny_version:
        print(
"tiny")
        model = create_tiny_model(input_shape, anchors, num_classes,
           
freeze_body=2)
   
else:
        model = create_model(input_shape, anchors, num_classes,
load_pretrained=False,
                             
weights_path=args.data_url+'/trained_weights_final.h5',
           
freeze_body=2) # make sure you know what you freeze

   
logging = TensorBoard(log_dir=args.train_url)

    checkpoint = ModelCheckpoint(args.train_url +
'ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5',
       
monitor='val_loss', save_weights_only=True, save_best_only=True, period=3)
    reduce_lr = ReduceLROnPlateau(
monitor='val_loss', factor=0.1, patience=3, verbose=1)
    early_stopping = EarlyStopping(
monitor='val_loss', min_delta=0, patience=10, verbose=1)

   
# train_path = os.path.join(args.train_url, "2020_mask.txt")
    # with open(train_path) as t_f:
    #     t_lines = t_f.readlines()
   
t_lines = mask_convert(args.data_url, classes)
    np.random.seed(
10101)
    np.random.shuffle(t_lines)
    np.random.seed(
None)
    sep_num = int(
0.8*len(t_lines))
    v_lines = t_lines[sep_num:]
    t_lines = t_lines[:sep_num]
    num_train = len(t_lines)
   
# with open(val_path) as v_f:
    #     v_lines = v_f.readlines()
   
np.random.seed(10010)
    np.random.shuffle(v_lines)
    np.random.seed(
None)
    num_val = len(v_lines)

if __name__ == '__main__':
    _main()


本地训练生成的模型,使用脚本yolo_Mobilenet.py预测图片:

import colorsys
import os
from timeit import default_timer as timer
import tensorflow as tf
import numpy as np
from keras import backend as K
from keras.models import load_model
from keras.layers import Input
from PIL import Image, ImageFont, ImageDraw

from yolo3.model_Mobilenet import yolo_eval, yolo_body, tiny_yolo_body
from yolo3.utils import letterbox_image
import os
os.environ[
'CUDA_VISIBLE_DEVICES'] = '0'
from keras.utils import multi_gpu_model
gpu_num=
1

class YOLO(object):
   
def __init__(self):
       
self.model_path = 'logs/maskMobilenet/001_Mobilenet_finetune/trained_weights_final.h5' # model path or trained weights path
       
self.anchors_path = 'model_data/yolo_anchors.txt'
       
self.classes_path = 'model_data/mask_classes.txt'
       
self.score = 0.3
       
self.iou = 0.45
        
self.class_names = self._get_class()
       
self.anchors = self._get_anchors()
       
self.sess = K.get_session()
       
self.model_image_size = (320, 320) # fixed size or (None, None), hw
       
self.boxes, self.scores, self.classes = self.generate()


   
def _get_class(self):
        classes_path = os.path.expanduser(
self.classes_path)
       
with open(classes_path) as f:
            class_names = f.readlines()
        class_names = [c.strip()
for c in class_names]
       
return class_names

    
def _get_anchors(self):
        anchors_path = os.path.expanduser(
self.anchors_path)
       
with open(anchors_path) as f:
            anchors = f.readline()
        anchors = [float(x)
for x in anchors.split(',')]
       
return np.array(anchors).reshape(-1, 2)

   
def generate(self):
       
'''to generate the bounding boxes'''
       
model_path = os.path.expanduser(self.model_path)
       
assert model_path.endswith('.h5'), 'Keras model or weights must be a .h5 file.'

        
# Load model, or construct model and load weights.
       
num_anchors = len(self.anchors)
        num_classes = len(
self.class_names)
        is_tiny_version = num_anchors==
6 # default setting
       
try:
           
self.yolo_model = load_model(model_path, compile=False)
       
except:
           
self.yolo_model = tiny_yolo_body(Input(shape=(None,None,3)), num_anchors//2, num_classes) \
               
if is_tiny_version else yolo_body(Input(shape=(None,None,3)), num_anchors//3, num_classes)
            
self.yolo_model.load_weights(self.model_path) # make sure model, anchors and classes match
       
else:
           
assert self.yolo_model.layers[-1].output_shape[-1] == \
                num_anchors/len(
self.yolo_model.output) * (num_classes + 5), \
               
'Mismatch between model and given anchor and class sizes'

       
print('{} model, anchors, and classes loaded.'.format(model_path))

       
# Generate colors for drawing bounding boxes.
        # hsv_tuples = [(x / len(self.class_names), 1., 1.)
        #               for x in range(len(self.class_names))]
        # self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
        # self.colors = list(
        #     map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)),
        #         self.colors))
        # np.random.seed(10101)  # Fixed seed for consistent colors across runs.
        # np.random.shuffle(self.colors)  # Shuffle colors to decorrelate adjacent classes.
        # np.random.seed(None)  # Reset seed to default.

        # Generate output tensor targets for filtered bounding boxes.
       
self.input_image_shape = K.placeholder(shape=(2, ))
       
if gpu_num>=2:
           
self.yolo_model = multi_gpu_model(self.yolo_model, gpus=gpu_num)
        boxes, scores, classes = yolo_eval(
self.yolo_model.output, self.anchors,
                len(
self.class_names), self.input_image_shape,
               
score_threshold=self.score, iou_threshold=self.iou)
       
# default arg
        # self.yolo_model->'model_data/yolo.h5'
        # self.anchors->'model_data/yolo_anchors.txt'-> 9 scales for anchors
       
return boxes, scores, classes

   



def mask_detect(yolo,mainFolder = 'D:/Code/mask_detection/data/test'):
   
import json
    fold_list = range(
1, 15)

   
for i in fold_list:
        foldname = mainFolder
        list = os.listdir(foldname) 
# 列出文件夹下所有的目录与文件
       
json_all = {}
        json_f = open(
'annotation_YOLOv3.json', 'w')
       
for i in range(0, len(list)):
            name, ext = os.path.splitext(list[i])
           
if ext == '.jpg':
                print(list[i])
                json_pic = {}
                annotation = []
                image = Image.open(foldname +
'/' + list[i])
                rects = yolo.detect_image(image)
               
for rect in rects:
                    score, x1, y1, x2, y2 = float(rect[
'score']), int(float(rect['x1'])), int(float(rect['y1'])), int(
                        float(rect[
'x2'])), int(float(rect['y2']))
                   
if float(rect['score'])>0.5:
                        label =
"yes_mask"
                   
else:
                        label =
"no_mask"
                   
bbox = {"category": "ModelArts",
                           
"id": 0,
                           
"shape": ["Box", 1],
                           
"label": label,
                            
"x": x1,
                           
"y": y1,
                           
"width": x2 - x1,
                           
"height": y2 - y1,
                           
"score": score}
                    annotation.append(bbox)
                json_pic[
"annotations"] = annotation
                json_pic[
"height"] = 480
               
json_pic["name"] = list[i]
                json_pic[
"width"] = 640
               
json_all[list[i]] = json_pic
        json_f.write(json.dumps(json_all,
indent=4))
        json_f.close()
    yolo.close_session()


if __name__ == '__main__':
    mask_detect(YOLO())
   
#detect_test(YOLO(), json_name='../mrsub/mrsub_test.json',test_out_json='mobilenet_train_bw_test_mrsub.json', data_dst='../mrsub/')
    # detect_test_draw(YOLO(), json_name='D:/Code/mask_detection/keras-YOLOv3-mobilenet-master/annotation_YOLOv3.json',test_pic='D:/Code/mask_detection/data/test')


 

预测结果放在了annotation_YOLOv3.json,内容如下所示:

{
   
"no_1.jpg": {
       
"annotations": [
            {
                
"category": "ModelArts",
               
"id": 0,
               
"shape": [
                   
"Box",
                   
1
               
],
               
"label": "no_mask",
               
"x": 278,
               
"y": 82,
                
"width": 35,
               
"height": 62,
               
"score": 0.48977488
           
}
        ],
       
"height": 480,
       
"name": "no_1.jpg",
       
"width": 640
   
}
}

为了便于直观,我把结果打印在了图片上,如下所示:

1585834284841408.jpg



模型精度不是很高,有没有识别出来的。

方案三

我使用ModelArts训练平台,用的是OBS桶上传已经调试好的代码(建议大家体验ModelArtsNotebook训练模型方式),如下图所示:

1586259786721769.jpg

1586259786751743.jpg

接着启动Notebook,不过我没有用jupyter方式写代码,而是采用同步OBS桶的资源,通过Notebook启动一个GPU镜像:

1586259916303848.jpg

1586259917747537.jpg

1586259917998448.jpg

1586259917804191.jpg

1586259917589175.jpg

运行以下代码,启动训练任务:

1586260013230908.jpg

训练过程输出片段如下所示:

2020-04-07 18:58:14.497319: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally

7/7 [==============================] - 17s 2s/step - loss: 4226.4421 - val_loss: 22123.3750

Epoch 2/50

7/7 [==============================] - 6s 855ms/step - loss: 1083.1558 - val_loss: 1734.1427

Epoch 3/50

7/7 [==============================] - 6s 864ms/step - loss: 521.8567 - val_loss: 455.0971

Epoch 4/50

7/7 [==============================] - 6s 851ms/step - loss: 322.8907 - val_loss: 193.3107

Epoch 5/50

7/7 [==============================] - 6s 841ms/step - loss: 227.7257 - val_loss: 150.8902

Epoch 6/50

7/7 [==============================] - 6s 851ms/step - loss: 179.0605 - val_loss: 154.9351

Epoch 7/50

7/7 [==============================] - 6s 868ms/step - loss: 150.4297 - val_loss: 147.3101

Epoch 8/50

7/7 [==============================] - 8s 1s/step - loss: 129.5681 - val_loss: 144.8283

模型生成后,创建一个python脚本,代码如下,实现了模型文件拷贝到OBS桶:

from modelarts.session import Session

session = Session()

session.upload_data(bucket_path="/mask-detection-modelarts-test/run/log/", path="/home/ma-user/work/log/trained_weights_final.h5")

测试结果抽选了一张如下图所示:

1586260037788593.jpg


【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。