好玩的实例分割-------------mask-rcnn
学习前言
有几天没写博客了,今天解读我以前阅读的一篇论文maskRCNN,顺便解读下它的代码。
什么是mask-rcnn?
mask-rcnn是何凯明大神提出的,他是基于faster-rcnn提出的two-statge算法,该方法不仅完成了目标识别,还完成了高精度的语义分割。该模型的主要思路是,在以前faster-rcnn目标识别的一层,添加了一个mask语义分割。
mask-rcnn的优点
1.采用了fpn特征金字塔。在以往的检测中,fast rcnn,ROI的作用都在最后一层,这对于大目标检测没有什么问题,但是对于小目标的检测,精度系数不够。因为对于小目标而言, 当进行卷积池化到最后一层的时候,实际上的语义信息已经没有了,因为ROI映射到某个feature map的方法就是将底层坐标除以stride,显然可以理解,映射到feature map后就很小甚至没有。所以为了解决多尺度检测问题,引入了特征金字塔网络。FPN是为了自然地利用CNN层,以融合具有高分辨率的浅层layer,来具备高语义特征。下面这是一张用烂了的图。
2.采用了ROIAlign 。假定原图中有一region proposal,大小为665665,这样,映射到特征图中的大小:665/32=20.78,即20.7820.78,此时,没有像RoiPooling那样就行取整操作,而是保留浮点数。采用了双线性差值的方法,因为如果RoI Pooling的输出大小是7x7上,如果RON网络输出的RoI大小是8*8的,那么无法保证输入像素和输出像素是一一对应,首先他们包含的信息量不同(有的是1对1,有的是1对2),其次他们的坐标无法和输入对应起来。
3.引入了语义分割分支,实现了mask和class预测的关系的解耦,mask分支只做语义分割,类型预测的任务交给另一个分支。这与原本的FCN网络是不同的,原始的FCN在预测mask时还用同时预测mask所属的种类。
github
https://github.com/yanjingke/mask-rcnn-keras
maskrcnn 实现思路
1、主干网络
在论文中主要提出 :通过 ResNet+FPN 用作特征提取网络,达到 state-of-the-art 的效果。 这里主要采用了利用resnet101。resnet101包括Conv block和Identity block,Conv block改变输入和输入的大小,Identity block输入和输出不变。
在stage1–c1部分采用一次普通卷积和全局平均池化,使长和宽压缩4倍,通道数变为64。
在stage2–c2部分采用一次 conv_block和两次identity_block,使长和宽压缩4倍,通道数变为256。
在stage3–c3部分采用一次 conv_block和三次identity_block,使长和宽压缩8倍,通道数变为512。
在stage4–c4部分采用一次 conv_block和22次identity_block,使长和宽压缩16倍,通道数变为1024。
在stage5–c5部分采用一次 conv_block和2次identity_block,使长和宽压缩32倍,通道数变为2048。
然后利用特征金字塔对c1,c2,c3,c4,c5进行上采样操作变为p1,p2,p3,p4,p5.
restnet101代码(左半边部分):
from keras.layers import ZeroPadding2D,Conv2D,MaxPooling2D,BatchNormalization,Activation,Add
def identity_block(input_tensor, kernel_size, filters, stage, block, use_bias=True, train_bn=True): nb_filter1, nb_filter2, nb_filter3 = filters conv_name_base = 'res' + str(stage) + block + '_branch' bn_name_base = 'bn' + str(stage) + block + '_branch' x = Conv2D(nb_filter1, (1, 1), name=conv_name_base + '2a', use_bias=use_bias)(input_tensor) x = BatchNormalization(name=bn_name_base + '2a')(x, training=train_bn) x = Activation('relu')(x) x = Conv2D(nb_filter2, (kernel_size, kernel_size), padding='same', name=conv_name_base + '2b', use_bias=use_bias)(x) x = BatchNormalization(name=bn_name_base + '2b')(x, training=train_bn) x = Activation('relu')(x) x = Conv2D(nb_filter3, (1, 1), name=conv_name_base + '2c', use_bias=use_bias)(x) x = BatchNormalization(name=bn_name_base + '2c')(x, training=train_bn) x = Add()([x, input_tensor]) x = Activation('relu', name='res' + str(stage) + block + '_out')(x) return x
def conv_block(input_tensor, kernel_size, filters, stage, block, strides=(2, 2), use_bias=True, train_bn=True): nb_filter1, nb_filter2, nb_filter3 = filters conv_name_base = 'res' + str(stage) + block + '_branch' bn_name_base = 'bn' + str(stage) + block + '_branch' x = Conv2D(nb_filter1, (1, 1), strides=strides, name=conv_name_base + '2a', use_bias=use_bias)(input_tensor) x = BatchNormalization(name=bn_name_base + '2a')(x, training=train_bn) x = Activation('relu')(x) x = Conv2D(nb_filter2, (kernel_size, kernel_size), padding='same', name=conv_name_base + '2b', use_bias=use_bias)(x) x = BatchNormalization(name=bn_name_base + '2b')(x, training=train_bn) x = Activation('relu')(x) x = Conv2D(nb_filter3, (1, 1), name=conv_name_base + '2c', use_bias=use_bias)(x) x = BatchNormalization(name=bn_name_base + '2c')(x, training=train_bn) shortcut = Conv2D(nb_filter3, (1, 1), strides=strides, name=conv_name_base + '1', use_bias=use_bias)(input_tensor) shortcut = BatchNormalization(name=bn_name_base + '1')(shortcut, training=train_bn) x = Add()([x, shortcut]) x = Activation('relu', name='res' + str(stage) + block + '_out')(x) return x
def get_resnet(input_image,stage5=False, train_bn=True): # Stage 1 x = ZeroPadding2D((3, 3))(input_image) x = Conv2D(64, (7, 7), strides=(2, 2), name='conv1', use_bias=True)(x) x = BatchNormalization(name='bn_conv1')(x, training=train_bn) x = Activation('relu')(x) # Height/4,Width/4,64 C1 = x = MaxPooling2D((3, 3), strides=(2, 2), padding="same")(x) # Stage 2 x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1), train_bn=train_bn) x = identity_block(x, 3, [64, 64, 256], stage=2, block='b', train_bn=train_bn) # Height/4,Width/4,256 C2 = x = identity_block(x, 3, [64, 64, 256], stage=2, block='c', train_bn=train_bn) # Stage 3 x = conv_block(x, 3, [128, 128, 512], stage=3, block='a', train_bn=train_bn) x = identity_block(x, 3, [128, 128, 512], stage=3, block='b', train_bn=train_bn) x = identity_block(x, 3, [128, 128, 512], stage=3, block='c', train_bn=train_bn) # Height/8,Width/8,512 C3 = x = identity_block(x, 3, [128, 128, 512], stage=3, block='d', train_bn=train_bn) # Stage 4 x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a', train_bn=train_bn) block_count = 22 for i in range(block_count): x = identity_block(x, 3, [256, 256, 1024], stage=4, block=chr(98 + i), train_bn=train_bn) # Height/16,Width/16,1024 C4 = x # Stage 5 if stage5: x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a', train_bn=train_bn) x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b', train_bn=train_bn) # Height/32,Width/32,2048 C5 = x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c', train_bn=train_bn) else: C5 = None return [C1, C2, C3, C4, C5]
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
特征金字塔部分代码(右半边):
def get_predict_model(config): h, w = config.IMAGE_SHAPE[:2] if h / 2**6 != int(h / 2**6) or w / 2**6 != int(w / 2**6): raise Exception("Image size must be dividable by 2 at least 6 times " "to avoid fractions when downscaling and upscaling." "For example, use 256, 320, 384, 448, 512, ... etc. ") # 输入进来的图片必须是2的6次方以上的倍数 input_image = Input(shape=[None, None, config.IMAGE_SHAPE[2]], name="input_image") # meta包含了一些必要信息 input_image_meta = Input(shape=[config.IMAGE_META_SIZE],name="input_image_meta") # 输入进来的先验框 input_anchors = Input(shape=[None, 4], name="input_anchors") # 获得Resnet里的压缩程度不同的一些层 _, C2, C3, C4, C5 = get_resnet(input_image, stage5=True, train_bn=config.TRAIN_BN) # 组合成特征金字塔的结构 # P5长宽共压缩了5次 # Height/32,Width/32,256 P5 = Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c5p5')(C5) # P4长宽共压缩了4次 # Height/16,Width/16,256 P4 = Add(name="fpn_p4add")([ UpSampling2D(size=(2, 2), name="fpn_p5upsampled")(P5), Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c4p4')(C4)]) # P4长宽共压缩了3次 # Height/8,Width/8,256 P3 = Add(name="fpn_p3add")([ UpSampling2D(size=(2, 2), name="fpn_p4upsampled")(P4), Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c3p3')(C3)]) # P4长宽共压缩了2次 # Height/4,Width/4,256 P2 = Add(name="fpn_p2add")([ UpSampling2D(size=(2, 2), name="fpn_p3upsampled")(P3), Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c2p2')(C2)]) # 各自进行一次256通道的卷积,此时P2、P3、P4、P5通道数相同 # Height/4,Width/4,256 P2 = Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p2")(P2) # Height/8,Width/8,256 P3 = Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p3")(P3) # Height/16,Width/16,256 P4 = Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p4")(P4) # Height/32,Width/32,256 P5 = Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p5")(P5) # 在建议框网络里面还有一个P6用于获取建议框 # Height/64,Width/64,256 P6 = MaxPooling2D(pool_size=(1, 1), strides=2, name="fpn_p6")(P5)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
获得Proposal建议框
在进行建议框预测时,利用p2,p3,p4,p5,p6对RPN建议框网络进行调整,获得先验框调整参数和先验框内部是否包含物体。它首先利用33的卷积,通道数变为512.在利用2个11的卷积分别对先验框调整参数和内部是否包含物体进行预测。anchors_per_location x 4为每个先验框调整参数。anchors_per_location x 2为 每一个预测框内部是否包含了物体。但是这次预测还只是粗略的预测。
代码如下:
def rpn_graph(feature_map, anchors_per_location): shared = Conv2D(512, (3, 3), padding='same', activation='relu', name='rpn_conv_shared')(feature_map) x = Conv2D(2 * anchors_per_location, (1, 1), padding='valid', activation='linear', name='rpn_class_raw')(shared) # batch_size,num_anchors,2 # 代表这个先验框对应的类 rpn_class_logits = Reshape([-1,2])(x) rpn_probs = Activation( "softmax", name="rpn_class_xxx")(rpn_class_logits) x = Conv2D(anchors_per_location * 4, (1, 1), padding="valid", activation='linear', name='rpn_bbox_pred')(shared) # batch_size,num_anchors,4 # 这个先验框的调整参数 rpn_bbox = Reshape([-1,4])(x) return [rpn_class_logits, rpn_probs, rpn_bbox]
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
Proposal建议框的解码
在预测完,要进行对预测框的解码,使它接近真实框。
用一张解码过程用烂了的图吧
class ProposalLayer(Layer): def __init__(self, proposal_count, nms_threshold, config=None, **kwargs): super(ProposalLayer, self).__init__(**kwargs) self.config = config self.proposal_count = proposal_count self.nms_threshold = nms_threshold # [rpn_class, rpn_bbox, anchors] def call(self, inputs): # 代表这个先验框内部是否有物体[batch, num_rois, 1] scores = inputs[0][:, :, 1] # 代表这个先验框的调整参数[batch, num_rois, 4] deltas = inputs[1] # [0.1 0.1 0.2 0.2],改变数量级 deltas = deltas * np.reshape(self.config.RPN_BBOX_STD_DEV, [1, 1, 4]) # Anchors anchors = inputs[2] # 筛选出得分前6000个的框 pre_nms_limit = tf.minimum(self.config.PRE_NMS_LIMIT, tf.shape(anchors)[1]) # 获得这些框的索引 ix = tf.nn.top_k(scores, pre_nms_limit, sorted=True, name="top_anchors").indices # 获得这些框的得分 scores = utils.batch_slice([scores, ix], lambda x, y: tf.gather(x, y), self.config.IMAGES_PER_GPU) # 获得这些框的调整参数 deltas = utils.batch_slice([deltas, ix], lambda x, y: tf.gather(x, y), self.config.IMAGES_PER_GPU) # 获得这些框对应的先验框 pre_nms_anchors = utils.batch_slice([anchors, ix], lambda a, x: tf.gather(a, x), self.config.IMAGES_PER_GPU, names=["pre_nms_anchors"]) # [batch, N, (y1, x1, y2, x2)] # 对先验框进行解码 boxes = utils.batch_slice([pre_nms_anchors, deltas], lambda x, y: apply_box_deltas_graph(x, y), self.config.IMAGES_PER_GPU, names=["refined_anchors"]) # [batch, N, (y1, x1, y2, x2)] # 防止超出图片范围 window = np.array([0, 0, 1, 1], dtype=np.float32) boxes = utils.batch_slice(boxes, lambda x: clip_boxes_graph(x, window), self.config.IMAGES_PER_GPU, names=["refined_anchors_clipped"]) # 非极大抑制 def nms(boxes, scores): indices = tf.image.non_max_suppression( boxes, scores, self.proposal_count, self.nms_threshold, name="rpn_non_max_suppression") proposals = tf.gather(boxes, indices) # 如果数量达不到设置的建议框数量的话 # 就padding padding = tf.maximum(self.proposal_count - tf.shape(proposals)[0], 0) proposals = tf.pad(proposals, [(0, padding), (0, 0)]) return proposals proposals = utils.batch_slice([boxes, scores], nms, self.config.IMAGES_PER_GPU) return proposals def compute_output_shape(self, input_shape): return (None, self.proposal_count, 4)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
Roi Align 层
Roi Align层对解码后的框框生固定大小的feature map,截取到固定的大小。
在下部分会对feature mapre size到77256
在上部分mask 会对feature mapre size到1414256
class PyramidROIAlign(Layer): def __init__(self, pool_shape, **kwargs): super(PyramidROIAlign, self).__init__(**kwargs) self.pool_shape = tuple(pool_shape) def call(self, inputs): # 建议框的位置 boxes = inputs[0] # image_meta包含了一些必要的图片信息 image_meta = inputs[1] # 取出所有的特征层[batch, height, width, channels] feature_maps = inputs[2:] y1, x1, y2, x2 = tf.split(boxes, 4, axis=2) h = y2 - y1 w = x2 - x1 # 获得输入进来的图像的大小 image_shape = parse_image_meta_graph(image_meta)['image_shape'][0] # 通过建议框的大小找到这个建议框属于哪个特征层 image_area = tf.cast(image_shape[0] * image_shape[1], tf.float32) roi_level = log2_graph(tf.sqrt(h * w) / (224.0 / tf.sqrt(image_area))) roi_level = tf.minimum(5, tf.maximum( 2, 4 + tf.cast(tf.round(roi_level), tf.int32))) # batch_size, box_num roi_level = tf.squeeze(roi_level, 2) # Loop through levels and apply ROI pooling to each. P2 to P5. pooled = [] box_to_level = [] # 分别在P2-P5中进行截取 for i, level in enumerate(range(2, 6)): # 找到每个特征层对应box ix = tf.where(tf.equal(roi_level, level)) level_boxes = tf.gather_nd(boxes, ix) box_to_level.append(ix) # 获得这些box所属的图片 box_indices = tf.cast(ix[:, 0], tf.int32) # 停止梯度下降 level_boxes = tf.stop_gradient(level_boxes) box_indices = tf.stop_gradient(box_indices) # Result: [batch * num_boxes, pool_height, pool_width, channels] pooled.append(tf.image.crop_and_resize( feature_maps[i], level_boxes, box_indices, self.pool_shape, method="bilinear")) pooled = tf.concat(pooled, axis=0) # 将顺序和所属的图片进行堆叠 box_to_level = tf.concat(box_to_level, axis=0) box_range = tf.expand_dims(tf.range(tf.shape(box_to_level)[0]), 1) box_to_level = tf.concat([tf.cast(box_to_level, tf.int32), box_range], axis=1) # box_to_level[:, 0]表示第几张图 # box_to_level[:, 1]表示第几张图里的第几个框 sorting_tensor = box_to_level[:, 0] * 100000 + box_to_level[:, 1] # 进行排序,将同一张图里的某一些聚集在一起 ix = tf.nn.top_k(sorting_tensor, k=tf.shape( box_to_level)[0]).indices[::-1] # 按顺序获得图片的索引 ix = tf.gather(box_to_level[:, 2], ix) pooled = tf.gather(pooled, ix) # 重新reshape为原来的格式 # 也就是 # Shape: [batch, num_rois, POOL_SIZE, POOL_SIZE, channels] shape = tf.concat([tf.shape(boxes)[:2], tf.shape(pooled)[1:]], axis=0) pooled = tf.reshape(pooled, shape) return pooled def compute_output_shape(self, input_shape): return input_shape[0][:2] + self.pool_shape + (input_shape[2][-1],
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
建立classifier模型
这个模型的预测结果会调整建议框, 获得最终的预测框.
代码:
def fpn_classifier_graph(rois, feature_maps, image_meta, pool_size, num_classes, train_bn=True, fc_layers_size=1024): # ROI Pooling,利用建议框在特征层上进行截取 # Shape: [batch, num_rois, POOL_SIZE, POOL_SIZE, channels] x = PyramidROIAlign([pool_size, pool_size], name="roi_align_classifier")([rois, image_meta] + feature_maps) # Shape: [batch, num_rois, 1, 1, fc_layers_size],相当于两次全连接 x = TimeDistributed(Conv2D(fc_layers_size, (pool_size, pool_size), padding="valid"), name="mrcnn_class_conv1")(x) x = TimeDistributed(BatchNormalization(), name='mrcnn_class_bn1')(x, training=train_bn) x = Activation('relu')(x) # Shape: [batch, num_rois, 1, 1, fc_layers_size] x = TimeDistributed(Conv2D(fc_layers_size, (1, 1)), name="mrcnn_class_conv2")(x) x = TimeDistributed(BatchNormalization(), name='mrcnn_class_bn2')(x, training=train_bn) x = Activation('relu')(x) # Shape: [batch, num_rois, fc_layers_size] shared = Lambda(lambda x: K.squeeze(K.squeeze(x, 3), 2), name="pool_squeeze")(x) # Classifier head # 这个的预测结果代表这个先验框内部的物体的种类 mrcnn_class_logits = TimeDistributed(Dense(num_classes), name='mrcnn_class_logits')(shared) mrcnn_probs = TimeDistributed(Activation("softmax"), name="mrcnn_class")(mrcnn_class_logits) # BBox head # 这个的预测结果会对先验框进行调整 # [batch, num_rois, NUM_CLASSES * (dy, dx, log(dh), log(dw))] x = TimeDistributed(Dense(num_classes * 4, activation='linear'), name='mrcnn_bbox_fc')(shared) # Reshape to [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))] mrcnn_bbox = Reshape((-1, num_classes, 4), name="mrcnn_bbox")(x) return mrcnn_class_logits, mrcnn_probs, mrcnn_bbox
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
建立mask模型
def build_fpn_mask_graph(rois, feature_maps, image_meta, pool_size, num_classes, train_bn=True): # ROI Pooling,利用建议框在特征层上进行截取 # Shape: [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, channels] x = PyramidROIAlign([pool_size, pool_size], name="roi_align_mask")([rois, image_meta] + feature_maps) # Shape: [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, channels] x = TimeDistributed(Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv1")(x) x = TimeDistributed(BatchNormalization(), name='mrcnn_mask_bn1')(x, training=train_bn) x = Activation('relu')(x) # Shape: [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, channels] x = TimeDistributed(Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv2")(x) x = TimeDistributed(BatchNormalization(), name='mrcnn_mask_bn2')(x, training=train_bn) x = Activation('relu')(x) # Shape: [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, channels] x = TimeDistributed(Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv3")(x) x = TimeDistributed(BatchNormalization(), name='mrcnn_mask_bn3')(x, training=train_bn) x = Activation('relu')(x) # Shape: [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, channels] x = TimeDistributed(Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv4")(x) x = TimeDistributed(BatchNormalization(), name='mrcnn_mask_bn4')(x, training=train_bn) x = Activation('relu')(x) # Shape: [batch, num_rois, 2xMASK_POOL_SIZE, 2xMASK_POOL_SIZE, channels] x = TimeDistributed(Conv2DTranspose(256, (2, 2), strides=2, activation="relu"), name="mrcnn_mask_deconv")(x) # 反卷积后再次进行一个1x1卷积调整通道,使其最终数量为numclasses,代表分的类 x = TimeDistributed(Conv2D(num_classes, (1, 1), strides=1, activation="sigmoid"), name="mrcnn_mask")(x) return x
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
loss值计算
由于增加了mask分支,每个ROI的Loss函数如下所示:
代码如下:
def batch_pack_graph(x, counts, num_rows): """Picks different number of values from each row in x depending on the values in counts. """ outputs = [] for i in range(num_rows): outputs.append(x[i, :counts[i]]) return tf.concat(outputs, axis=0)
def smooth_l1_loss(y_true, y_pred): """Implements Smooth-L1 loss. y_true and y_pred are typically: [N, 4], but could be any shape. """ diff = K.abs(y_true - y_pred) less_than_one = K.cast(K.less(diff, 1.0), "float32") loss = (less_than_one * 0.5 * diff**2) + (1 - less_than_one) * (diff - 0.5) return loss
def rpn_class_loss_graph(rpn_match, rpn_class_logits): """RPN anchor classifier loss. rpn_match: [batch, anchors, 1]. Anchor match type. 1=positive, -1=negative, 0=neutral anchor. rpn_class_logits: [batch, anchors, 2]. RPN classifier logits for BG/FG. """ # Squeeze last dim to simplify rpn_match = tf.squeeze(rpn_match, -1) # Get anchor classes. Convert the -1/+1 match to 0/1 values. anchor_class = K.cast(K.equal(rpn_match, 1), tf.int32) # Positive and Negative anchors contribute to the loss, # but neutral anchors (match value = 0) don't. indices = tf.where(K.not_equal(rpn_match, 0)) # Pick rows that contribute to the loss and filter out the rest. rpn_class_logits = tf.gather_nd(rpn_class_logits, indices) anchor_class = tf.gather_nd(anchor_class, indices) # Cross entropy loss loss = K.sparse_categorical_crossentropy(target=anchor_class, output=rpn_class_logits, from_logits=True) loss = K.switch(tf.size(loss) > 0, K.mean(loss), tf.constant(0.0)) return loss
def rpn_bbox_loss_graph(config, target_bbox, rpn_match, rpn_bbox): """Return the RPN bounding box loss graph. config: the model config object. target_bbox: [batch, max positive anchors, (dy, dx, log(dh), log(dw))]. Uses 0 padding to fill in unsed bbox deltas. rpn_match: [batch, anchors, 1]. Anchor match type. 1=positive, -1=negative, 0=neutral anchor. rpn_bbox: [batch, anchors, (dy, dx, log(dh), log(dw))] """ # Positive anchors contribute to the loss, but negative and # neutral anchors (match value of 0 or -1) don't. rpn_match = K.squeeze(rpn_match, -1) indices = tf.where(K.equal(rpn_match, 1)) # Pick bbox deltas that contribute to the loss rpn_bbox = tf.gather_nd(rpn_bbox, indices) # Trim target bounding box deltas to the same length as rpn_bbox. batch_counts = K.sum(K.cast(K.equal(rpn_match, 1), tf.int32), axis=1) target_bbox = batch_pack_graph(target_bbox, batch_counts, config.IMAGES_PER_GPU) loss = smooth_l1_loss(target_bbox, rpn_bbox) loss = K.switch(tf.size(loss) > 0, K.mean(loss), tf.constant(0.0)) return loss
def mrcnn_class_loss_graph(target_class_ids, pred_class_logits, active_class_ids): """Loss for the classifier head of Mask RCNN. target_class_ids: [batch, num_rois]. Integer class IDs. Uses zero padding to fill in the array. pred_class_logits: [batch, num_rois, num_classes] active_class_ids: [batch, num_classes]. Has a value of 1 for classes that are in the dataset of the image, and 0 for classes that are not in the dataset. """ # During model building, Keras calls this function with # target_class_ids of type float32. Unclear why. Cast it # to int to get around it. target_class_ids = tf.cast(target_class_ids, 'int64') # Find predictions of classes that are not in the dataset. pred_class_ids = tf.argmax(pred_class_logits, axis=2) # TODO: Update this line to work with batch > 1. Right now it assumes all # images in a batch have the same active_class_ids pred_active = tf.gather(active_class_ids[0], pred_class_ids) # Loss loss = tf.nn.sparse_softmax_cross_entropy_with_logits( labels=target_class_ids, logits=pred_class_logits) # Erase losses of predictions of classes that are not in the active # classes of the image. loss = loss * pred_active # Computer loss mean. Use only predictions that contribute # to the loss to get a correct mean. loss = tf.reduce_sum(loss) / tf.reduce_sum(pred_active) return loss
def mrcnn_bbox_loss_graph(target_bbox, target_class_ids, pred_bbox): """Loss for Mask R-CNN bounding box refinement. target_bbox: [batch, num_rois, (dy, dx, log(dh), log(dw))] target_class_ids: [batch, num_rois]. Integer class IDs. pred_bbox: [batch, num_rois, num_classes, (dy, dx, log(dh), log(dw))] """ # Reshape to merge batch and roi dimensions for simplicity. target_class_ids = K.reshape(target_class_ids, (-1,)) target_bbox = K.reshape(target_bbox, (-1, 4)) pred_bbox = K.reshape(pred_bbox, (-1, K.int_shape(pred_bbox)[2], 4)) # Only positive ROIs contribute to the loss. And only # the right class_id of each ROI. Get their indices. positive_roi_ix = tf.where(target_class_ids > 0)[:, 0] positive_roi_class_ids = tf.cast( tf.gather(target_class_ids, positive_roi_ix), tf.int64) indices = tf.stack([positive_roi_ix, positive_roi_class_ids], axis=1) # Gather the deltas (predicted and true) that contribute to loss target_bbox = tf.gather(target_bbox, positive_roi_ix) pred_bbox = tf.gather_nd(pred_bbox, indices) # Smooth-L1 Loss loss = K.switch(tf.size(target_bbox) > 0, smooth_l1_loss(y_true=target_bbox, y_pred=pred_bbox), tf.constant(0.0)) loss = K.mean(loss) return loss
def mrcnn_mask_loss_graph(target_masks, target_class_ids, pred_masks): """Mask binary cross-entropy loss for the masks head. target_masks: [batch, num_rois, height, width]. A float32 tensor of values 0 or 1. Uses zero padding to fill array. target_class_ids: [batch, num_rois]. Integer class IDs. Zero padded. pred_masks: [batch, proposals, height, width, num_classes] float32 tensor with values from 0 to 1. """ # Reshape for simplicity. Merge first two dimensions into one. target_class_ids = K.reshape(target_class_ids, (-1,)) mask_shape = tf.shape(target_masks) target_masks = K.reshape(target_masks, (-1, mask_shape[2], mask_shape[3])) pred_shape = tf.shape(pred_masks) pred_masks = K.reshape(pred_masks, (-1, pred_shape[2], pred_shape[3], pred_shape[4])) # Permute predicted masks to [N, num_classes, height, width] pred_masks = tf.transpose(pred_masks, [0, 3, 1, 2]) # Only positive ROIs contribute to the loss. And only # the class specific mask of each ROI. positive_ix = tf.where(target_class_ids > 0)[:, 0] positive_class_ids = tf.cast( tf.gather(target_class_ids, positive_ix), tf.int64) indices = tf.stack([positive_ix, positive_class_ids], axis=1) # Gather the masks (predicted and true) that contribute to loss y_true = tf.gather(target_masks, positive_ix) y_pred = tf.gather_nd(pred_masks, indices) # Compute binary cross entropy. If no positive ROIs, then return 0. # shape: [batch, roi, num_classes] loss = K.switch(tf.size(y_true) > 0, K.binary_crossentropy(target=y_true, output=y_pred), tf.constant(0.0)) loss = K.mean(loss) return loss
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
文章来源: blog.csdn.net,作者:快了的程序猿小可哥,版权归原作者所有,如需转载,请联系作者。
原文链接:blog.csdn.net/qq_35914625/article/details/108141999
- 点赞
- 收藏
- 关注作者
评论(0)