- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

好玩的实例分割-------------mask-rcnn

yd_234306724 发表于 2020/12/28 00:07:37 2020/12/28

【摘要】好玩的实例分割-------------mask-rcnn 学习前言什么是mask-rcnn？mask-rcnn的优点githubmaskrcnn 实现思路获得Proposal建议框Proposal建议框的解码Roi Align 层建立classifier模型建立mask模型loss值计算学习前言有几天没写博客了，今天解读我以前阅读的一篇论文mask...

学习前言

有几天没写博客了，今天解读我以前阅读的一篇论文maskRCNN，顺便解读下它的代码。

什么是mask-rcnn？

mask-rcnn是何凯明大神提出的，他是基于faster-rcnn提出的two-statge算法，该方法不仅完成了目标识别，还完成了高精度的语义分割。该模型的主要思路是，在以前faster-rcnn目标识别的一层，添加了一个mask语义分割。

mask-rcnn的优点

1.采用了fpn特征金字塔。在以往的检测中，fast rcnn，ROI的作用都在最后一层，这对于大目标检测没有什么问题，但是对于小目标的检测，精度系数不够。因为对于小目标而言，当进行卷积池化到最后一层的时候，实际上的语义信息已经没有了，因为ROI映射到某个feature map的方法就是将底层坐标除以stride，显然可以理解，映射到feature map后就很小甚至没有。所以为了解决多尺度检测问题，引入了特征金字塔网络。FPN是为了自然地利用CNN层，以融合具有高分辨率的浅层layer，来具备高语义特征。下面这是一张用烂了的图。

2.采用了ROIAlign 。假定原图中有一region proposal，大小为665665，这样，映射到特征图中的大小：665/32=20.78,即20.7820.78，此时，没有像RoiPooling那样就行取整操作，而是保留浮点数。采用了双线性差值的方法，因为如果RoI Pooling的输出大小是7x7上，如果RON网络输出的RoI大小是8*8的，那么无法保证输入像素和输出像素是一一对应，首先他们包含的信息量不同（有的是1对1，有的是1对2），其次他们的坐标无法和输入对应起来。
3.引入了语义分割分支，实现了mask和class预测的关系的解耦，mask分支只做语义分割，类型预测的任务交给另一个分支。这与原本的FCN网络是不同的，原始的FCN在预测mask时还用同时预测mask所属的种类。

github

https://github.com/yanjingke/mask-rcnn-keras

maskrcnn 实现思路

1、主干网络
在论文中主要提出：通过 ResNet+FPN 用作特征提取网络，达到 state-of-the-art 的效果。这里主要采用了利用resnet101。resnet101包括Conv block和Identity block，Conv block改变输入和输入的大小，Identity block输入和输出不变。

在stage1–c1部分采用一次普通卷积和全局平均池化，使长和宽压缩4倍，通道数变为64。
在stage2–c2部分采用一次 conv_block和两次identity_block，使长和宽压缩4倍，通道数变为256。
在stage3–c3部分采用一次 conv_block和三次identity_block，使长和宽压缩8倍，通道数变为512。
在stage4–c4部分采用一次 conv_block和22次identity_block，使长和宽压缩16倍，通道数变为1024。
在stage5–c5部分采用一次 conv_block和2次identity_block，使长和宽压缩32倍，通道数变为2048。
然后利用特征金字塔对c1,c2,c3,c4,c5进行上采样操作变为p1,p2,p3,p4,p5.
restnet101代码（左半边部分）：

from keras.layers import ZeroPadding2D,Conv2D,MaxPooling2D,BatchNormalization,Activation,Add


def identity_block(input_tensor, kernel_size, filters, stage, block, use_bias=True, train_bn=True): nb_filter1, nb_filter2, nb_filter3 = filters conv_name_base = 'res' + str(stage) + block + '_branch' bn_name_base = 'bn' + str(stage) + block + '_branch' x = Conv2D(nb_filter1, (1, 1), name=conv_name_base + '2a', use_bias=use_bias)(input_tensor) x = BatchNormalization(name=bn_name_base + '2a')(x, training=train_bn) x = Activation('relu')(x) x = Conv2D(nb_filter2, (kernel_size, kernel_size), padding='same', name=conv_name_base + '2b', use_bias=use_bias)(x) x = BatchNormalization(name=bn_name_base + '2b')(x, training=train_bn) x = Activation('relu')(x) x = Conv2D(nb_filter3, (1, 1), name=conv_name_base + '2c', use_bias=use_bias)(x) x = BatchNormalization(name=bn_name_base + '2c')(x, training=train_bn) x = Add()([x, input_tensor]) x = Activation('relu', name='res' + str(stage) + block + '_out')(x) return x

def conv_block(input_tensor, kernel_size, filters, stage, block, strides=(2, 2), use_bias=True, train_bn=True): nb_filter1, nb_filter2, nb_filter3 = filters conv_name_base = 'res' + str(stage) + block + '_branch' bn_name_base = 'bn' + str(stage) + block + '_branch' x = Conv2D(nb_filter1, (1, 1), strides=strides, name=conv_name_base + '2a', use_bias=use_bias)(input_tensor) x = BatchNormalization(name=bn_name_base + '2a')(x, training=train_bn) x = Activation('relu')(x) x = Conv2D(nb_filter2, (kernel_size, kernel_size), padding='same', name=conv_name_base + '2b', use_bias=use_bias)(x) x = BatchNormalization(name=bn_name_base + '2b')(x, training=train_bn) x = Activation('relu')(x) x = Conv2D(nb_filter3, (1, 1), name=conv_name_base + '2c', use_bias=use_bias)(x) x = BatchNormalization(name=bn_name_base + '2c')(x, training=train_bn) shortcut = Conv2D(nb_filter3, (1, 1), strides=strides, name=conv_name_base + '1', use_bias=use_bias)(input_tensor) shortcut = BatchNormalization(name=bn_name_base + '1')(shortcut, training=train_bn) x = Add()([x, shortcut]) x = Activation('relu', name='res' + str(stage) + block + '_out')(x) return x

def get_resnet(input_image,stage5=False, train_bn=True): # Stage 1 x = ZeroPadding2D((3, 3))(input_image) x = Conv2D(64, (7, 7), strides=(2, 2), name='conv1', use_bias=True)(x) x = BatchNormalization(name='bn_conv1')(x, training=train_bn) x = Activation('relu')(x) # Height/4,Width/4,64 C1 = x = MaxPooling2D((3, 3), strides=(2, 2), padding="same")(x) # Stage 2 x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1), train_bn=train_bn) x = identity_block(x, 3, [64, 64, 256], stage=2, block='b', train_bn=train_bn) # Height/4,Width/4,256 C2 = x = identity_block(x, 3, [64, 64, 256], stage=2, block='c', train_bn=train_bn) # Stage 3 x = conv_block(x, 3, [128, 128, 512], stage=3, block='a', train_bn=train_bn) x = identity_block(x, 3, [128, 128, 512], stage=3, block='b', train_bn=train_bn) x = identity_block(x, 3, [128, 128, 512], stage=3, block='c', train_bn=train_bn) # Height/8,Width/8,512 C3 = x = identity_block(x, 3, [128, 128, 512], stage=3, block='d', train_bn=train_bn) # Stage 4 x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a', train_bn=train_bn) block_count = 22 for i in range(block_count): x = identity_block(x, 3, [256, 256, 1024], stage=4, block=chr(98 + i), train_bn=train_bn) # Height/16,Width/16,1024 C4 = x # Stage 5 if stage5: x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a', train_bn=train_bn) x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b', train_bn=train_bn) # Height/32,Width/32,2048 C5 = x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c', train_bn=train_bn) else: C5 = None return [C1, C2, C3, C4, C5]

  
 
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21
  22
  23
  24
  25
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39
  40
  41
  42
  43
  44
  45
  46
  47
  48
  49
  50
  51
  52
  53
  54
  55
  56
  57
  58
  59
  60
  61
  62
  63
  64
  65
  66
  67
  68
  69
  70
  71
  72
  73
  74
  75
  76
  77
  78
  79
  80
  81
  82
  83
  84
  85
  86
  87
  88
  89
  90
  91

特征金字塔部分代码（右半边）：

def get_predict_model(config): h, w = config.IMAGE_SHAPE[:2] if h / 2**6 != int(h / 2**6) or w / 2**6 != int(w / 2**6): raise Exception("Image size must be dividable by 2 at least 6 times " "to avoid fractions when downscaling and upscaling." "For example, use 256, 320, 384, 448, 512, ... etc. ") # 输入进来的图片必须是2的6次方以上的倍数 input_image = Input(shape=[None, None, config.IMAGE_SHAPE[2]], name="input_image") # meta包含了一些必要信息 input_image_meta = Input(shape=[config.IMAGE_META_SIZE],name="input_image_meta") # 输入进来的先验框 input_anchors = Input(shape=[None, 4], name="input_anchors") # 获得Resnet里的压缩程度不同的一些层 _, C2, C3, C4, C5 = get_resnet(input_image, stage5=True, train_bn=config.TRAIN_BN) # 组合成特征金字塔的结构 # P5长宽共压缩了5次 # Height/32,Width/32,256 P5 = Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c5p5')(C5) # P4长宽共压缩了4次 # Height/16,Width/16,256 P4 = Add(name="fpn_p4add")([ UpSampling2D(size=(2, 2), name="fpn_p5upsampled")(P5), Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c4p4')(C4)]) # P4长宽共压缩了3次 # Height/8,Width/8,256 P3 = Add(name="fpn_p3add")([ UpSampling2D(size=(2, 2), name="fpn_p4upsampled")(P4), Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c3p3')(C3)]) # P4长宽共压缩了2次 # Height/4,Width/4,256 P2 = Add(name="fpn_p2add")([ UpSampling2D(size=(2, 2), name="fpn_p3upsampled")(P3), Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c2p2')(C2)]) # 各自进行一次256通道的卷积，此时P2、P3、P4、P5通道数相同 # Height/4,Width/4,256 P2 = Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p2")(P2) # Height/8,Width/8,256 P3 = Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p3")(P3) # Height/16,Width/16,256 P4 = Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p4")(P4) # Height/32,Width/32,256 P5 = Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p5")(P5) # 在建议框网络里面还有一个P6用于获取建议框 # Height/64,Width/64,256 P6 = MaxPooling2D(pool_size=(1, 1), strides=2, name="fpn_p6")(P5)

  
 
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21
  22
  23
  24
  25
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39
  40
  41
  42
  43
  44
  45
  46
  47
  48
  49
  50

获得Proposal建议框

在进行建议框预测时，利用p2,p3,p4,p5,p6对RPN建议框网络进行调整，获得先验框调整参数和先验框内部是否包含物体。它首先利用33的卷积，通道数变为512.在利用2个11的卷积分别对先验框调整参数和内部是否包含物体进行预测。anchors_per_location x 4为每个先验框调整参数。anchors_per_location x 2为每一个预测框内部是否包含了物体。但是这次预测还只是粗略的预测。
代码如下：

def rpn_graph(feature_map, anchors_per_location): shared = Conv2D(512, (3, 3), padding='same', activation='relu', name='rpn_conv_shared')(feature_map) x = Conv2D(2 * anchors_per_location, (1, 1), padding='valid', activation='linear', name='rpn_class_raw')(shared) # batch_size,num_anchors,2 # 代表这个先验框对应的类 rpn_class_logits = Reshape([-1,2])(x) rpn_probs = Activation( "softmax", name="rpn_class_xxx")(rpn_class_logits) x = Conv2D(anchors_per_location * 4, (1, 1), padding="valid", activation='linear', name='rpn_bbox_pred')(shared) # batch_size,num_anchors,4 # 这个先验框的调整参数 rpn_bbox = Reshape([-1,4])(x) return [rpn_class_logits, rpn_probs, rpn_bbox]

  
 
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21

Proposal建议框的解码

在预测完，要进行对预测框的解码，使它接近真实框。
用一张解码过程用烂了的图吧

class ProposalLayer(Layer): def __init__(self, proposal_count, nms_threshold, config=None, **kwargs): super(ProposalLayer, self).__init__(**kwargs) self.config = config self.proposal_count = proposal_count self.nms_threshold = nms_threshold # [rpn_class, rpn_bbox, anchors] def call(self, inputs): # 代表这个先验框内部是否有物体[batch, num_rois, 1] scores = inputs[0][:, :, 1] # 代表这个先验框的调整参数[batch, num_rois, 4] deltas = inputs[1] # [0.1 0.1 0.2 0.2]，改变数量级 deltas = deltas * np.reshape(self.config.RPN_BBOX_STD_DEV, [1, 1, 4]) # Anchors anchors = inputs[2] # 筛选出得分前6000个的框 pre_nms_limit = tf.minimum(self.config.PRE_NMS_LIMIT, tf.shape(anchors)[1]) # 获得这些框的索引 ix = tf.nn.top_k(scores, pre_nms_limit, sorted=True, name="top_anchors").indices # 获得这些框的得分 scores = utils.batch_slice([scores, ix], lambda x, y: tf.gather(x, y), self.config.IMAGES_PER_GPU) # 获得这些框的调整参数 deltas = utils.batch_slice([deltas, ix], lambda x, y: tf.gather(x, y), self.config.IMAGES_PER_GPU) # 获得这些框对应的先验框 pre_nms_anchors = utils.batch_slice([anchors, ix], lambda a, x: tf.gather(a, x), self.config.IMAGES_PER_GPU, names=["pre_nms_anchors"]) # [batch, N, (y1, x1, y2, x2)] # 对先验框进行解码 boxes = utils.batch_slice([pre_nms_anchors, deltas], lambda x, y: apply_box_deltas_graph(x, y), self.config.IMAGES_PER_GPU, names=["refined_anchors"]) # [batch, N, (y1, x1, y2, x2)] # 防止超出图片范围 window = np.array([0, 0, 1, 1], dtype=np.float32) boxes = utils.batch_slice(boxes, lambda x: clip_boxes_graph(x, window), self.config.IMAGES_PER_GPU, names=["refined_anchors_clipped"]) # 非极大抑制 def nms(boxes, scores): indices = tf.image.non_max_suppression( boxes, scores, self.proposal_count, self.nms_threshold, name="rpn_non_max_suppression") proposals = tf.gather(boxes, indices) # 如果数量达不到设置的建议框数量的话 # 就padding padding = tf.maximum(self.proposal_count - tf.shape(proposals)[0], 0) proposals = tf.pad(proposals, [(0, padding), (0, 0)]) return proposals proposals = utils.batch_slice([boxes, scores], nms, self.config.IMAGES_PER_GPU) return proposals def compute_output_shape(self, input_shape): return (None, self.proposal_count, 4)


  
 
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21
  22
  23
  24
  25
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39
  40
  41
  42
  43
  44
  45
  46
  47
  48
  49
  50
  51
  52
  53
  54
  55
  56
  57
  58
  59
  60
  61
  62
  63
  64
  65
  66
  67
  68
  69
  70
  71
  72
  73
  74

Roi Align 层

Roi Align层对解码后的框框生固定大小的feature map，截取到固定的大小。
在下部分会对feature mapre size到77256

在上部分mask 会对feature mapre size到1414256

class PyramidROIAlign(Layer): def __init__(self, pool_shape, **kwargs): super(PyramidROIAlign, self).__init__(**kwargs) self.pool_shape = tuple(pool_shape) def call(self, inputs): # 建议框的位置 boxes = inputs[0] # image_meta包含了一些必要的图片信息 image_meta = inputs[1] # 取出所有的特征层[batch, height, width, channels] feature_maps = inputs[2:] y1, x1, y2, x2 = tf.split(boxes, 4, axis=2) h = y2 - y1 w = x2 - x1 # 获得输入进来的图像的大小 image_shape = parse_image_meta_graph(image_meta)['image_shape'][0] # 通过建议框的大小找到这个建议框属于哪个特征层 image_area = tf.cast(image_shape[0] * image_shape[1], tf.float32) roi_level = log2_graph(tf.sqrt(h * w) / (224.0 / tf.sqrt(image_area))) roi_level = tf.minimum(5, tf.maximum( 2, 4 + tf.cast(tf.round(roi_level), tf.int32))) # batch_size, box_num roi_level = tf.squeeze(roi_level, 2) # Loop through levels and apply ROI pooling to each. P2 to P5. pooled = [] box_to_level = [] # 分别在P2-P5中进行截取 for i, level in enumerate(range(2, 6)): # 找到每个特征层对应box ix = tf.where(tf.equal(roi_level, level)) level_boxes = tf.gather_nd(boxes, ix) box_to_level.append(ix) # 获得这些box所属的图片 box_indices = tf.cast(ix[:, 0], tf.int32) # 停止梯度下降 level_boxes = tf.stop_gradient(level_boxes) box_indices = tf.stop_gradient(box_indices) # Result: [batch * num_boxes, pool_height, pool_width, channels] pooled.append(tf.image.crop_and_resize( feature_maps[i], level_boxes, box_indices, self.pool_shape, method="bilinear")) pooled = tf.concat(pooled, axis=0) # 将顺序和所属的图片进行堆叠 box_to_level = tf.concat(box_to_level, axis=0) box_range = tf.expand_dims(tf.range(tf.shape(box_to_level)[0]), 1) box_to_level = tf.concat([tf.cast(box_to_level, tf.int32), box_range], axis=1) # box_to_level[:, 0]表示第几张图 # box_to_level[:, 1]表示第几张图里的第几个框 sorting_tensor = box_to_level[:, 0] * 100000 + box_to_level[:, 1] # 进行排序，将同一张图里的某一些聚集在一起 ix = tf.nn.top_k(sorting_tensor, k=tf.shape( box_to_level)[0]).indices[::-1] # 按顺序获得图片的索引 ix = tf.gather(box_to_level[:, 2], ix) pooled = tf.gather(pooled, ix) # 重新reshape为原来的格式 # 也就是 # Shape: [batch, num_rois, POOL_SIZE, POOL_SIZE, channels] shape = tf.concat([tf.shape(boxes)[:2], tf.shape(pooled)[1:]], axis=0) pooled = tf.reshape(pooled, shape) return pooled def compute_output_shape(self, input_shape): return input_shape[0][:2] + self.pool_shape + (input_shape[2][-1], 

  
 
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21
  22
  23
  24
  25
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39
  40
  41
  42
  43
  44
  45
  46
  47
  48
  49
  50
  51
  52
  53
  54
  55
  56
  57
  58
  59
  60
  61
  62
  63
  64
  65
  66
  67
  68
  69
  70
  71
  72
  73
  74
  75
  76
  77
  78
  79
  80

建立classifier模型

这个模型的预测结果会调整建议框，获得最终的预测框.
代码：

def fpn_classifier_graph(rois, feature_maps, image_meta, pool_size, num_classes, train_bn=True, fc_layers_size=1024): # ROI Pooling，利用建议框在特征层上进行截取 # Shape: [batch, num_rois, POOL_SIZE, POOL_SIZE, channels] x = PyramidROIAlign([pool_size, pool_size], name="roi_align_classifier")([rois, image_meta] + feature_maps) # Shape: [batch, num_rois, 1, 1, fc_layers_size]，相当于两次全连接 x = TimeDistributed(Conv2D(fc_layers_size, (pool_size, pool_size), padding="valid"), name="mrcnn_class_conv1")(x) x = TimeDistributed(BatchNormalization(), name='mrcnn_class_bn1')(x, training=train_bn) x = Activation('relu')(x) # Shape: [batch, num_rois, 1, 1, fc_layers_size] x = TimeDistributed(Conv2D(fc_layers_size, (1, 1)), name="mrcnn_class_conv2")(x) x = TimeDistributed(BatchNormalization(), name='mrcnn_class_bn2')(x, training=train_bn) x = Activation('relu')(x) # Shape: [batch, num_rois, fc_layers_size] shared = Lambda(lambda x: K.squeeze(K.squeeze(x, 3), 2), name="pool_squeeze")(x) # Classifier head # 这个的预测结果代表这个先验框内部的物体的种类 mrcnn_class_logits = TimeDistributed(Dense(num_classes), name='mrcnn_class_logits')(shared) mrcnn_probs = TimeDistributed(Activation("softmax"), name="mrcnn_class")(mrcnn_class_logits) # BBox head # 这个的预测结果会对先验框进行调整 # [batch, num_rois, NUM_CLASSES * (dy, dx, log(dh), log(dw))] x = TimeDistributed(Dense(num_classes * 4, activation='linear'), name='mrcnn_bbox_fc')(shared) # Reshape to [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))] mrcnn_bbox = Reshape((-1, num_classes, 4), name="mrcnn_bbox")(x) return mrcnn_class_logits, mrcnn_probs, mrcnn_bbox


  
 
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21
  22
  23
  24
  25
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39
  40
  41
  42

建立mask模型

def build_fpn_mask_graph(rois, feature_maps, image_meta, pool_size, num_classes, train_bn=True): # ROI Pooling，利用建议框在特征层上进行截取 # Shape: [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, channels] x = PyramidROIAlign([pool_size, pool_size], name="roi_align_mask")([rois, image_meta] + feature_maps) # Shape: [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, channels] x = TimeDistributed(Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv1")(x) x = TimeDistributed(BatchNormalization(), name='mrcnn_mask_bn1')(x, training=train_bn) x = Activation('relu')(x) # Shape: [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, channels] x = TimeDistributed(Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv2")(x) x = TimeDistributed(BatchNormalization(), name='mrcnn_mask_bn2')(x, training=train_bn) x = Activation('relu')(x) # Shape: [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, channels] x = TimeDistributed(Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv3")(x) x = TimeDistributed(BatchNormalization(), name='mrcnn_mask_bn3')(x, training=train_bn) x = Activation('relu')(x) # Shape: [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, channels] x = TimeDistributed(Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv4")(x) x = TimeDistributed(BatchNormalization(), name='mrcnn_mask_bn4')(x, training=train_bn) x = Activation('relu')(x) # Shape: [batch, num_rois, 2xMASK_POOL_SIZE, 2xMASK_POOL_SIZE, channels] x = TimeDistributed(Conv2DTranspose(256, (2, 2), strides=2, activation="relu"), name="mrcnn_mask_deconv")(x) # 反卷积后再次进行一个1x1卷积调整通道，使其最终数量为numclasses，代表分的类 x = TimeDistributed(Conv2D(num_classes, (1, 1), strides=1, activation="sigmoid"), name="mrcnn_mask")(x) return x



  
 
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21
  22
  23
  24
  25
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39
  40
  41
  42
  43
  44

loss值计算

由于增加了mask分支，每个ROI的Loss函数如下所示：

代码如下：


def batch_pack_graph(x, counts, num_rows): """Picks different number of values from each row in x depending on the values in counts. """ outputs = [] for i in range(num_rows): outputs.append(x[i, :counts[i]]) return tf.concat(outputs, axis=0)

def smooth_l1_loss(y_true, y_pred): """Implements Smooth-L1 loss. y_true and y_pred are typically: [N, 4], but could be any shape. """ diff = K.abs(y_true - y_pred) less_than_one = K.cast(K.less(diff, 1.0), "float32") loss = (less_than_one * 0.5 * diff**2) + (1 - less_than_one) * (diff - 0.5) return loss


def rpn_class_loss_graph(rpn_match, rpn_class_logits): """RPN anchor classifier loss. rpn_match: [batch, anchors, 1]. Anchor match type. 1=positive, -1=negative, 0=neutral anchor. rpn_class_logits: [batch, anchors, 2]. RPN classifier logits for BG/FG. """ # Squeeze last dim to simplify rpn_match = tf.squeeze(rpn_match, -1) # Get anchor classes. Convert the -1/+1 match to 0/1 values. anchor_class = K.cast(K.equal(rpn_match, 1), tf.int32) # Positive and Negative anchors contribute to the loss, # but neutral anchors (match value = 0) don't. indices = tf.where(K.not_equal(rpn_match, 0)) # Pick rows that contribute to the loss and filter out the rest. rpn_class_logits = tf.gather_nd(rpn_class_logits, indices) anchor_class = tf.gather_nd(anchor_class, indices) # Cross entropy loss loss = K.sparse_categorical_crossentropy(target=anchor_class, output=rpn_class_logits, from_logits=True) loss = K.switch(tf.size(loss) > 0, K.mean(loss), tf.constant(0.0)) return loss


def rpn_bbox_loss_graph(config, target_bbox, rpn_match, rpn_bbox): """Return the RPN bounding box loss graph. config: the model config object. target_bbox: [batch, max positive anchors, (dy, dx, log(dh), log(dw))]. Uses 0 padding to fill in unsed bbox deltas. rpn_match: [batch, anchors, 1]. Anchor match type. 1=positive, -1=negative, 0=neutral anchor. rpn_bbox: [batch, anchors, (dy, dx, log(dh), log(dw))] """ # Positive anchors contribute to the loss, but negative and # neutral anchors (match value of 0 or -1) don't. rpn_match = K.squeeze(rpn_match, -1) indices = tf.where(K.equal(rpn_match, 1)) # Pick bbox deltas that contribute to the loss rpn_bbox = tf.gather_nd(rpn_bbox, indices) # Trim target bounding box deltas to the same length as rpn_bbox. batch_counts = K.sum(K.cast(K.equal(rpn_match, 1), tf.int32), axis=1) target_bbox = batch_pack_graph(target_bbox, batch_counts, config.IMAGES_PER_GPU) loss = smooth_l1_loss(target_bbox, rpn_bbox) loss = K.switch(tf.size(loss) > 0, K.mean(loss), tf.constant(0.0)) return loss


def mrcnn_class_loss_graph(target_class_ids, pred_class_logits, active_class_ids): """Loss for the classifier head of Mask RCNN. target_class_ids: [batch, num_rois]. Integer class IDs. Uses zero padding to fill in the array. pred_class_logits: [batch, num_rois, num_classes] active_class_ids: [batch, num_classes]. Has a value of 1 for classes that are in the dataset of the image, and 0 for classes that are not in the dataset. """ # During model building, Keras calls this function with # target_class_ids of type float32. Unclear why. Cast it # to int to get around it. target_class_ids = tf.cast(target_class_ids, 'int64') # Find predictions of classes that are not in the dataset. pred_class_ids = tf.argmax(pred_class_logits, axis=2) # TODO: Update this line to work with batch > 1. Right now it assumes all # images in a batch have the same active_class_ids pred_active = tf.gather(active_class_ids[0], pred_class_ids) # Loss loss = tf.nn.sparse_softmax_cross_entropy_with_logits( labels=target_class_ids, logits=pred_class_logits) # Erase losses of predictions of classes that are not in the active # classes of the image. loss = loss * pred_active # Computer loss mean. Use only predictions that contribute # to the loss to get a correct mean. loss = tf.reduce_sum(loss) / tf.reduce_sum(pred_active) return loss


def mrcnn_bbox_loss_graph(target_bbox, target_class_ids, pred_bbox): """Loss for Mask R-CNN bounding box refinement. target_bbox: [batch, num_rois, (dy, dx, log(dh), log(dw))] target_class_ids: [batch, num_rois]. Integer class IDs. pred_bbox: [batch, num_rois, num_classes, (dy, dx, log(dh), log(dw))] """ # Reshape to merge batch and roi dimensions for simplicity. target_class_ids = K.reshape(target_class_ids, (-1,)) target_bbox = K.reshape(target_bbox, (-1, 4)) pred_bbox = K.reshape(pred_bbox, (-1, K.int_shape(pred_bbox)[2], 4)) # Only positive ROIs contribute to the loss. And only # the right class_id of each ROI. Get their indices. positive_roi_ix = tf.where(target_class_ids > 0)[:, 0] positive_roi_class_ids = tf.cast( tf.gather(target_class_ids, positive_roi_ix), tf.int64) indices = tf.stack([positive_roi_ix, positive_roi_class_ids], axis=1) # Gather the deltas (predicted and true) that contribute to loss target_bbox = tf.gather(target_bbox, positive_roi_ix) pred_bbox = tf.gather_nd(pred_bbox, indices) # Smooth-L1 Loss loss = K.switch(tf.size(target_bbox) > 0, smooth_l1_loss(y_true=target_bbox, y_pred=pred_bbox), tf.constant(0.0)) loss = K.mean(loss) return loss


def mrcnn_mask_loss_graph(target_masks, target_class_ids, pred_masks): """Mask binary cross-entropy loss for the masks head. target_masks: [batch, num_rois, height, width]. A float32 tensor of values 0 or 1. Uses zero padding to fill array. target_class_ids: [batch, num_rois]. Integer class IDs. Zero padded. pred_masks: [batch, proposals, height, width, num_classes] float32 tensor with values from 0 to 1. """ # Reshape for simplicity. Merge first two dimensions into one. target_class_ids = K.reshape(target_class_ids, (-1,)) mask_shape = tf.shape(target_masks) target_masks = K.reshape(target_masks, (-1, mask_shape[2], mask_shape[3])) pred_shape = tf.shape(pred_masks) pred_masks = K.reshape(pred_masks, (-1, pred_shape[2], pred_shape[3], pred_shape[4])) # Permute predicted masks to [N, num_classes, height, width] pred_masks = tf.transpose(pred_masks, [0, 3, 1, 2]) # Only positive ROIs contribute to the loss. And only # the class specific mask of each ROI. positive_ix = tf.where(target_class_ids > 0)[:, 0] positive_class_ids = tf.cast( tf.gather(target_class_ids, positive_ix), tf.int64) indices = tf.stack([positive_ix, positive_class_ids], axis=1) # Gather the masks (predicted and true) that contribute to loss y_true = tf.gather(target_masks, positive_ix) y_pred = tf.gather_nd(pred_masks, indices) # Compute binary cross entropy. If no positive ROIs, then return 0. # shape: [batch, roi, num_classes] loss = K.switch(tf.size(y_true) > 0, K.binary_crossentropy(target=y_true, output=y_pred), tf.constant(0.0)) loss = K.mean(loss) return loss


  
 
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21
  22
  23
  24
  25
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39
  40
  41
  42
  43
  44
  45
  46
  47
  48
  49
  50
  51
  52
  53
  54
  55
  56
  57
  58
  59
  60
  61
  62
  63
  64
  65
  66
  67
  68
  69
  70
  71
  72
  73
  74
  75
  76
  77
  78
  79
  80
  81
  82
  83
  84
  85
  86
  87
  88
  89
  90
  91
  92
  93
  94
  95
  96
  97
  98
  99
  100
  101
  102
  103
  104
  105
  106
  107
  108
  109
  110
  111
  112
  113
  114
  115
  116
  117
  118
  119
  120
  121
  122
  123
  124
  125
  126
  127
  128
  129
  130
  131
  132
  133
  134
  135
  136
  137
  138
  139
  140
  141
  142
  143
  144
  145
  146
  147
  148
  149
  150
  151
  152
  153
  154
  155
  156
  157
  158
  159
  160
  161
  162
  163
  164
  165
  166
  167
  168
  169
  170
  171
  172
  173
  174
  175
  176
  177
  178
  179

文章来源: blog.csdn.net，作者：快了的程序猿小可哥，版权归原作者所有，如需转载，请联系作者。

原文链接：blog.csdn.net/qq_35914625/article/details/108141999

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

好玩的实例分割-------------mask-rcnn

好玩的实例分割-------------mask-rcnn

学习前言

什么是mask-rcnn？

mask-rcnn的优点

github

maskrcnn 实现思路

获得Proposal建议框

Proposal建议框的解码

Roi Align 层

建立classifier模型

建立mask模型

loss值计算

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

好玩的实例分割-------------mask-rcnn

好玩的实例分割-------------mask-rcnn

学习前言

什么是mask-rcnn？

mask-rcnn的优点

github

maskrcnn 实现思路

获得Proposal建议框

Proposal建议框的解码

Roi Align 层

建立classifier模型

建立mask模型

loss值计算

全部回复

设置昵称

关于作者

目录

热门推荐查看更多

相关文章

加入云驻计划，成为创作者

相关产品