重新思考神经网络的平移不变性:反锯齿下采样论文复现
【摘要】 作者考虑了CNN网络的各个结构,认为卷积层本身是具有平移不变性的,而池化层破坏了平移不变性。作者认为可以借鉴信号处理中反锯齿算法的设计,即在信号下采样之前进行低通滤波(也就是图像模糊),缓解池化操作带来的对平移不变性的破坏。
重新思考神经网络的平移不变性:反锯齿下采样论文复现
1、前言
通常我们认为深度CNN对图像的平移、形变具有不变性,但在论文Why do deep convolutional networks generalize so poorly to small image transformations? 却指出:当图像在当前平面上平移几个像素后,现代CNN(如VGG16、ResNet50和InceptionResNetV2)的输出会发生巨大改变,而且图像越小,网络的识别性能越差;同时,网络的深度也会影响它的错误率。
2、论文解读
在论文Making Convolutional Networks Shift-Invariant Again,baseline展示了CNN网络的预测结果随着图像变化而大幅变化,即当目标平移几个像素之后,目标则不能被正确的预测。作者考虑了CNN网络的各个结构,认为卷积层本身是具有平移不变性的,而池化层破坏了平移不变性。作者认为可以借鉴信号处理中反锯齿算法的设计,即在信号下采样之前进行低通滤波(也就是图像模糊),缓解池化操作带来的对平移不变性的破坏。其具体做法如下图所示:
在论文中,作者展示了原始的MaxPool操作,作者将其看为两步,先Max,再下采样。
作者的做法是在Max之后加一步图像模糊,作者对StridedConv与AveragePool等涉及到下采样的网络操作都进行了改进,成为ConvBlurPool、BlurPool,即都是在下采样之前进行模糊操作。实验中也研究使用了不同类型和参数的模糊核。
3、复现详情
import paddle
import numpy as np
import paddle.nn as nn
import paddle.nn.functional as F
def get_pad_layer(pad_type):
if(pad_type in ['refl','reflect']):
PadLayer = nn.Pad2D
elif(pad_type in ['repl','replicate']):
PadLayer = nn.Pad2D
elif(pad_type=='zero'):
PadLayer = nn.ZeroPad2D
else:
print('Pad type [%s] not recognized'%pad_type)
return PadLayer
class BlurPool(nn.Layer):
def __init__(self, channels, pad_type='reflect', filt_size=4, stride=2, pad_off=0):
super(BlurPool, self).__init__()
self.filt_size = filt_size
self.pad_off = pad_off
self.pad_sizes = [int(1.*(filt_size-1)/2), int(np.ceil(1.*(filt_size-1)/2)), int(1.*(filt_size-1)/2), int(np.ceil(1.*(filt_size-1)/2))]
self.pad_sizes = [pad_size+pad_off for pad_size in self.pad_sizes]
self.stride = stride
self.off = int((self.stride-1)/2.)
self.channels = channels
if(self.filt_size==1):
a = np.array([1.,])
elif(self.filt_size==2):
a = np.array([1., 1.])
elif(self.filt_size==3):
a = np.array([1., 2., 1.])
elif(self.filt_size==4):
a = np.array([1., 3., 3., 1.])
elif(self.filt_size==5):
a = np.array([1., 4., 6., 4., 1.])
elif(self.filt_size==6):
a = np.array([1., 5., 10., 10., 5., 1.])
elif(self.filt_size==7):
a = np.array([1., 6., 15., 20., 15., 6., 1.])
filt = paddle.Tensor(a[:,None]*a[None,:])
filt = filt/paddle.sum(filt)
# self.register_buffer('filt', filt[None,None,:,:].repeat((self.channels,1,1,1)))
self.pad = get_pad_layer(pad_type)(self.pad_sizes)
def forward(self, inp):
if(self.filt_size==1):
if(self.pad_off==0):
return inp[:,:,::self.stride,::self.stride]
else:
return self.pad(inp)[:,:,::self.stride,::self.stride]
else:
return F.conv2D(self.pad(inp), self.filt, stride=self.stride, groups=inp.shape[1])
# replacing MaxPool with BlurPool layers
class AlexNetNMP(nn.Layer):
def __init__(self, num_classes=10, filter_size=1):
super(AlexNetNMP, self).__init__()
self.features = nn.Sequential(
nn.Conv2D(3, 64, kernel_size=11, stride=4, padding=2),
nn.ReLU(),
BlurPool(64, filt_size=filter_size, stride=2),
nn.Conv2D(64, 192, kernel_size=5, padding=2),
nn.ReLU(),
BlurPool(192, filt_size=filter_size, stride=2),
nn.Conv2D(192, 384, kernel_size=3, padding=1),
nn.ReLU(),
nn.Conv2D(384, 256, kernel_size=3, padding=1),
nn.ReLU(),
nn.Conv2D(256, 256, kernel_size=3, padding=1),
nn.ReLU(),
BlurPool(256, filt_size=filter_size, stride=2),
)
self.avgpool = nn.AdaptiveAvgPool2D((6, 6))
self.classifier = nn.Sequential(
nn.Dropout(),
nn.Linear(256 * 6 * 6, 4096),
nn.ReLU(),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(),
nn.Linear(4096, num_classes),
)
def forward(self, x):
x = self.features(x)
x = self.avgpool(x)
x = x.reshape([x.shape[0], 256 * 6 * 6])
x = self.classifier(x)
return x
3、网络结构可视化
alenetnmp = AlexNetNMP(num_classes=10)
paddle.summary(alenetnmp,(1,3,224,224))
W0619 19:52:58.434113 663 gpu_context.cc:278] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0619 19:52:58.439054 663 gpu_context.cc:306] device: 0, cuDNN Version: 7.6.
-------------------------------------------------------------------------------
Layer (type) Input Shape Output Shape Param #
===============================================================================
Conv2D-1 [[1, 3, 224, 224]] [1, 64, 55, 55] 23,296
ReLU-1 [[1, 64, 55, 55]] [1, 64, 55, 55] 0
BlurPool-1 [[1, 64, 55, 55]] [1, 64, 28, 28] 0
Conv2D-2 [[1, 64, 28, 28]] [1, 192, 28, 28] 307,392
ReLU-2 [[1, 192, 28, 28]] [1, 192, 28, 28] 0
BlurPool-2 [[1, 192, 28, 28]] [1, 192, 14, 14] 0
Conv2D-3 [[1, 192, 14, 14]] [1, 384, 14, 14] 663,936
ReLU-3 [[1, 384, 14, 14]] [1, 384, 14, 14] 0
Conv2D-4 [[1, 384, 14, 14]] [1, 256, 14, 14] 884,992
ReLU-4 [[1, 256, 14, 14]] [1, 256, 14, 14] 0
Conv2D-5 [[1, 256, 14, 14]] [1, 256, 14, 14] 590,080
ReLU-5 [[1, 256, 14, 14]] [1, 256, 14, 14] 0
BlurPool-3 [[1, 256, 14, 14]] [1, 256, 7, 7] 0
AdaptiveAvgPool2D-1 [[1, 256, 7, 7]] [1, 256, 6, 6] 0
Dropout-1 [[1, 9216]] [1, 9216] 0
Linear-1 [[1, 9216]] [1, 4096] 37,752,832
ReLU-6 [[1, 4096]] [1, 4096] 0
Dropout-2 [[1, 4096]] [1, 4096] 0
Linear-2 [[1, 4096]] [1, 4096] 16,781,312
ReLU-7 [[1, 4096]] [1, 4096] 0
Linear-3 [[1, 4096]] [1, 10] 40,970
===============================================================================
Total params: 57,044,810
Trainable params: 57,044,810
Non-trainable params: 0
-------------------------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 8.99
Params size (MB): 217.61
Estimated Total Size (MB): 227.18
-------------------------------------------------------------------------------
{'total_params': 57044810, 'trainable_params': 57044810}
4、对比实验
import paddle
from paddle.metric import Accuracy
from paddle.vision.transforms import Compose, Normalize, Resize, Transpose, ToTensor
callback = paddle.callbacks.VisualDL(log_dir='visualdl_log_dir_alexnetnmp')
normalize = Normalize(mean=[0.5, 0.5, 0.5],
std=[0.5, 0.5, 0.5],
data_format='HWC')
transform = Compose([ToTensor(), Normalize(), Resize(size=(224,224))])
cifar10_train = paddle.vision.datasets.Cifar10(mode='train',
transform=transform)
cifar10_test = paddle.vision.datasets.Cifar10(mode='test',
transform=transform)
# 构建训练集数据加载器
train_loader = paddle.io.DataLoader(cifar10_train, batch_size=768, shuffle=True, drop_last=True)
# 构建测试集数据加载器
test_loader = paddle.io.DataLoader(cifar10_test, batch_size=768, shuffle=True, drop_last=True)
alexnmp = paddle.Model(AlexNetNMP(num_classes=10))
optim = paddle.optimizer.Adam(learning_rate=0.001, parameters=alexnmp.parameters())
alexnmp.prepare(
optim,
paddle.nn.CrossEntropyLoss(),
Accuracy()
)
alexnmp.fit(train_data=train_loader,
eval_data=test_loader,
epochs=12,
callbacks=callback,
verbose=1
)
item 44/41626 [..............................] - ETA: 1:17 - 2ms/item
Cache file /home/aistudio/.cache/paddle/dataset/cifar/cifar-10-python.tar.gz not found, downloading https://dataset.bj.bcebos.com/cifar/cifar-10-python.tar.gz
Begin to download
item 41020/41626 [============================>.] - ETA: 0s - 1ms/item
Download finished
The loss value printed in the log is the current step, and the metric is the average value of previous steps.
Epoch 1/12
step 65/65 [==============================] - loss: 1.9855 - acc: 0.1553 - 1s/step
Eval begin...
step 13/13 [==============================] - loss: 1.9558 - acc: 0.2991 - 1s/step
Eval samples: 9984
Epoch 2/12
step 65/65 [==============================] - loss: 1.5948 - acc: 0.3630 - 1s/step
Eval begin...
step 13/13 [==============================] - loss: 1.4321 - acc: 0.4535 - 1s/step
Eval samples: 9984
Epoch 3/12
step 65/65 [==============================] - loss: 1.3941 - acc: 0.4674 - 1s/step
Eval begin...
step 13/13 [==============================] - loss: 1.2871 - acc: 0.5361 - 1s/step
Eval samples: 9984
Epoch 4/12
step 65/65 [==============================] - loss: 1.1553 - acc: 0.5349 - 1s/step
Eval begin...
step 13/13 [==============================] - loss: 1.1521 - acc: 0.5865 - 1s/step
Eval samples: 9984
Epoch 5/12
step 65/65 [==============================] - loss: 1.0840 - acc: 0.5911 - 1s/step
Eval begin...
step 13/13 [==============================] - loss: 1.1133 - acc: 0.6048 - 1s/step
Eval samples: 9984
Epoch 6/12
step 65/65 [==============================] - loss: 0.9457 - acc: 0.6399 - 1s/step
Eval begin...
step 13/13 [==============================] - loss: 0.9893 - acc: 0.6483 - 1s/step
Eval samples: 9984
Epoch 7/12
step 65/65 [==============================] - loss: 0.8762 - acc: 0.6757 - 1s/step
Eval begin...
step 13/13 [==============================] - loss: 0.9500 - acc: 0.6715 - 1s/step
Eval samples: 9984
Epoch 8/12
step 65/65 [==============================] - loss: 0.7918 - acc: 0.7129 - 1s/step
Eval begin...
step 13/13 [==============================] - loss: 0.8281 - acc: 0.6980 - 1s/step
Eval samples: 9984
Epoch 9/12
step 65/65 [==============================] - loss: 0.7521 - acc: 0.7436 - 1s/step
Eval begin...
step 13/13 [==============================] - loss: 0.8672 - acc: 0.6927 - 1s/step
Eval samples: 9984
Epoch 10/12
step 65/65 [==============================] - loss: 0.6076 - acc: 0.7703 - 1s/step
Eval begin...
step 13/13 [==============================] - loss: 0.8094 - acc: 0.7266 - 1s/step
Eval samples: 9984
Epoch 11/12
step 65/65 [==============================] - loss: 0.5716 - acc: 0.8016 - 1s/step
Eval begin...
step 13/13 [==============================] - loss: 0.7376 - acc: 0.7358 - 1s/step
Eval samples: 9984
Epoch 12/12
step 65/65 [==============================] - loss: 0.5628 - acc: 0.8159 - 1s/step
Eval begin...
step 13/13 [==============================] - loss: 0.7734 - acc: 0.7325 - 1s/step
Eval samples: 9984
from paddle.vision.models import AlexNet
alexnet = AlexNet(num_classes=10)
paddle.summary(alexnet,(1,3,224,224))
---------------------------------------------------------------------------
Layer (type) Input Shape Output Shape Param #
===========================================================================
Conv2D-11 [[1, 3, 224, 224]] [1, 64, 55, 55] 23,296
ReLU-15 [[1, 64, 55, 55]] [1, 64, 55, 55] 0
MaxPool2D-1 [[1, 64, 55, 55]] [1, 64, 27, 27] 0
ConvPoolLayer-1 [[1, 3, 224, 224]] [1, 64, 27, 27] 0
Conv2D-12 [[1, 64, 27, 27]] [1, 192, 27, 27] 307,392
ReLU-16 [[1, 192, 27, 27]] [1, 192, 27, 27] 0
MaxPool2D-2 [[1, 192, 27, 27]] [1, 192, 13, 13] 0
ConvPoolLayer-2 [[1, 64, 27, 27]] [1, 192, 13, 13] 0
Conv2D-13 [[1, 192, 13, 13]] [1, 384, 13, 13] 663,936
Conv2D-14 [[1, 384, 13, 13]] [1, 256, 13, 13] 884,992
Conv2D-15 [[1, 256, 13, 13]] [1, 256, 13, 13] 590,080
ReLU-17 [[1, 256, 13, 13]] [1, 256, 13, 13] 0
MaxPool2D-3 [[1, 256, 13, 13]] [1, 256, 6, 6] 0
ConvPoolLayer-3 [[1, 256, 13, 13]] [1, 256, 6, 6] 0
Dropout-5 [[1, 9216]] [1, 9216] 0
Linear-7 [[1, 9216]] [1, 4096] 37,752,832
Dropout-6 [[1, 4096]] [1, 4096] 0
Linear-8 [[1, 4096]] [1, 4096] 16,781,312
Linear-9 [[1, 4096]] [1, 10] 40,970
===========================================================================
Total params: 57,044,810
Trainable params: 57,044,810
Non-trainable params: 0
---------------------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 8.09
Params size (MB): 217.61
Estimated Total Size (MB): 226.27
---------------------------------------------------------------------------
{'total_params': 57044810, 'trainable_params': 57044810}
import paddle
from paddle.metric import Accuracy
from paddle.vision.transforms import Compose, Normalize, Resize, Transpose, ToTensor
callback = paddle.callbacks.VisualDL(log_dir='visualdl_log_dir_alexnet')
normalize = Normalize(mean=[0.5, 0.5, 0.5],
std=[0.5, 0.5, 0.5],
data_format='HWC')
transform = Compose([ToTensor(), Normalize(), Resize(size=(224,224))])
cifar10_train = paddle.vision.datasets.Cifar10(mode='train',
transform=transform)
cifar10_test = paddle.vision.datasets.Cifar10(mode='test',
transform=transform)
# 构建训练集数据加载器
train_loader = paddle.io.DataLoader(cifar10_train, batch_size=768, shuffle=True, drop_last=True)
# 构建测试集数据加载器
test_loader = paddle.io.DataLoader(cifar10_test, batch_size=768, shuffle=True, drop_last=True)
alexnet = paddle.Model(AlexNet(num_classes=10))
optim = paddle.optimizer.Adam(learning_rate=0.001, parameters=alexnet.parameters())
alexnet.prepare(
optim,
paddle.nn.CrossEntropyLoss(),
Accuracy()
)
alexnet.fit(train_data=train_loader,
eval_data=test_loader,
epochs=12,
callbacks=callback,
verbose=1
)
The loss value printed in the log is the current step, and the metric is the average value of previous steps.
Epoch 1/12
step 65/65 [==============================] - loss: 1.9082 - acc: 0.1987 - 1s/step
Eval begin...
step 13/13 [==============================] - loss: 1.8753 - acc: 0.2979 - 1s/step
Eval samples: 9984
Epoch 2/12
step 65/65 [==============================] - loss: 1.5510 - acc: 0.3399 - 1s/step
Eval begin...
step 13/13 [==============================] - loss: 1.6200 - acc: 0.4145 - 1s/step
Eval samples: 9984
Epoch 3/12
step 65/65 [==============================] - loss: 1.3989 - acc: 0.4385 - 1s/step
Eval begin...
step 13/13 [==============================] - loss: 1.3316 - acc: 0.5037 - 1s/step
Eval samples: 9984
Epoch 4/12
step 65/65 [==============================] - loss: 1.2122 - acc: 0.5202 - 1s/step
Eval begin...
step 13/13 [==============================] - loss: 1.2373 - acc: 0.5680 - 1s/step
Eval samples: 9984
Epoch 5/12
step 65/65 [==============================] - loss: 1.2171 - acc: 0.5808 - 1s/step
Eval begin...
step 13/13 [==============================] - loss: 1.1474 - acc: 0.5967 - 1s/step
Eval samples: 9984
Epoch 6/12
step 65/65 [==============================] - loss: 1.0408 - acc: 0.6302 - 1s/step
Eval begin...
step 13/13 [==============================] - loss: 0.9405 - acc: 0.6544 - 1s/step
Eval samples: 9984
Epoch 7/12
step 65/65 [==============================] - loss: 0.9166 - acc: 0.6697 - 1s/step
Eval begin...
step 13/13 [==============================] - loss: 0.8147 - acc: 0.6761 - 1s/step
Eval samples: 9984
Epoch 8/12
step 65/65 [==============================] - loss: 0.8914 - acc: 0.7045 - 1s/step
Eval begin...
step 13/13 [==============================] - loss: 0.8807 - acc: 0.6971 - 1s/step
Eval samples: 9984
Epoch 9/12
step 65/65 [==============================] - loss: 0.7307 - acc: 0.7305 - 1s/step
Eval begin...
step 13/13 [==============================] - loss: 0.8092 - acc: 0.7222 - 1s/step
Eval samples: 9984
Epoch 10/12
step 65/65 [==============================] - loss: 0.6728 - acc: 0.7541 - 1s/step
Eval begin...
step 13/13 [==============================] - loss: 0.8192 - acc: 0.7256 - 1s/step
Eval samples: 9984
Epoch 11/12
step 65/65 [==============================] - loss: 0.6788 - acc: 0.7832 - 1s/step
Eval begin...
step 13/13 [==============================] - loss: 0.7691 - acc: 0.7382 - 1s/step
Eval samples: 9984
Epoch 12/12
step 65/65 [==============================] - loss: 0.5919 - acc: 0.8004 - 1s/step
Eval begin...
step 13/13 [==============================] - loss: 0.7637 - acc: 0.7544 - 1s/step
Eval samples: 9984
5、可视化训练过程
6、实验结果
Model | Train Acc | Eval Acc |
---|---|---|
AlexNet_Anti | 0.8159 | 0.7325 |
AlexNet | 0.8004 | 0.7544 |
总结
在本项目中,对反锯齿下采样进行了复现,并给出了对比实验,可能是由于迭代次数少,或者网络结构选取原因,并未取得与论文中相同的结果。但是我们应该重点关注的是,在论文里,作者认为,在解决CNN方法带来的一些问题中,研究人员往往忽略了传统信号处理领域里已经取得的成果。
这也为我们的研究工作打开了一个新的思路,也许在图像处理领域,当我们遇到新的问题时,也可以在信号处理领域获得灵感。
【声明】本内容来自华为云开发者社区博主,不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息,否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱:
cloudbbs@huaweicloud.com
- 点赞
- 收藏
- 关注作者
评论(0)