模型参数EMA理论详解与代码实战
模型参数EMA理论详解与代码实战
什么是EMA?
滑动平均(exponential moving average),或者叫做指数加权平均(exponentially weighted moving average),可以用来估计变量的局部均值,使得变量的更新与一段时间内的历史取值有关。滑动平均可以看作是变量的过去一段时间取值的均值,相比对变量直接赋值而言,滑动平均得到的值在图像上更加平缓光滑,抖动性更小,不会因为某次的异常取值而使得滑动平均值波动很大,如下图公式所示。在深度学习中,经常会使用EMA(指数移动平均)这个方法对模型的参数做平均,以求提高测试指标并增加模型鲁棒。
为什么EMA在测试过程中使用通常能提升模型表现?
滑动平均可以使模型在测试数据上更健壮(robust)。“采用随机梯度下降算法训练神经网络时,使用滑动平均在很多应用中都可以在一定程度上提高最终模型在测试数据上的表现。”
对神经网络边的权重 weights 使用滑动平均,得到对应的影子变量shadow_weights。在训练过程仍然使用原来不带滑动平均的权重 weights,以得到 weights 下一步更新的值,进而求下一步 weights 的影子变量 shadow_weights。之后在测试过程中使用shadow_weights 来代替 weights 作为神经网络边的权重,这样在测试数据上效果更好。
一个基本的假设是,对于模型最后的收敛阶段,模型的权重会在全局最优点抖动。所以,取最后收敛过程中模型权重的平均值更能代表模型的最终训练结果,并且能为模型带来更高的鲁棒性。
代码实战
代码实现主要参考了PaddleDetection中的实现,具体如下所示。
#! /usr/bin/env python
# coding=utf-8
# ================================================================
#
# Author : PaddleDetection
# Created date:
# Description :
#
# ================================================================
import paddle
import numpy as np
class ExponentialMovingAverage():
def __init__(self, model, decay, thres_steps=True):
self._model = model
self._decay = decay
self._thres_steps = thres_steps
self._shadow = {}
self._backup = {}
def register(self):
self._update_step = 0
for name, param in self._model.named_parameters():
if param.stop_gradient is False: # 只记录可训练参数。bn层的均值、方差的stop_gradient默认是True,所以不会记录bn层的均值、方差。
self._shadow[name] = param.numpy().copy()
def update(self):
for name, param in self._model.named_parameters():
if param.stop_gradient is False:
assert name in self._shadow
new_val = np.array(param.numpy().copy())
old_val = np.array(self._shadow[name])
decay = min(self._decay, (1 + self._update_step) / (10 + self._update_step)) if self._thres_steps else self._decay
new_average = decay * old_val + (1 - decay) * new_val
self._shadow[name] = new_average
self._update_step += 1
return decay
def apply(self):
for name, param in self._model.named_parameters():
if param.stop_gradient is False:
assert name in self._shadow
self._backup[name] = np.array(param.numpy().copy())
param.set_value(np.array(self._shadow[name]))
def restore(self):
for name, param in self._model.named_parameters():
if param.stop_gradient is False:
assert name in self._backup
param.set_value(self._backup[name])
self._backup = {}
from paddle.vision.transforms import Compose, Normalize
from paddle.vision.datasets import MNIST
import paddle
# 数据预处理,这里用到了随机调整亮度、对比度和饱和度
transform = Compose([Normalize(mean=[127.5],
std=[127.5],
data_format='CHW')])
# 数据加载,在训练集上应用数据预处理的操作
train_dataset = MNIST(mode='train', transform=transform)
test_dataset = MNIST(mode='test', transform=transform)
Cache file /home/aistudio/.cache/paddle/dataset/mnist/train-images-idx3-ubyte.gz not found, downloading https://dataset.bj.bcebos.com/mnist/train-images-idx3-ubyte.gz
Begin to download
Download finished
Cache file /home/aistudio/.cache/paddle/dataset/mnist/train-labels-idx1-ubyte.gz not found, downloading https://dataset.bj.bcebos.com/mnist/train-labels-idx1-ubyte.gz
Begin to download
........
Download finished
Cache file /home/aistudio/.cache/paddle/dataset/mnist/t10k-images-idx3-ubyte.gz not found, downloading https://dataset.bj.bcebos.com/mnist/t10k-images-idx3-ubyte.gz
Begin to download
Download finished
Cache file /home/aistudio/.cache/paddle/dataset/mnist/t10k-labels-idx1-ubyte.gz not found, downloading https://dataset.bj.bcebos.com/mnist/t10k-labels-idx1-ubyte.gz
Begin to download
..
Download finished
# 构建训练集数据加载器
train_loader = paddle.io.DataLoader(train_dataset, batch_size=64, shuffle=True)
# 构建测试集数据加载器
test_loader = paddle.io.DataLoader(test_dataset, batch_size=64, shuffle=True)
import paddle
from paddle.vision.models import LeNet
import paddle.nn as nn
mnist = LeNet()
# train
import paddle.nn.functional as F
# 初始化
ema = ExponentialMovingAverage(mnist, 0.9998)
ema.register()
mnist.train()
epochs = 2
optim = paddle.optimizer.Adam(learning_rate=0.001, parameters=mnist.parameters())
# 用Adam作为优化函数
for epoch in range(epochs):
for batch_id, data in enumerate(train_loader()):
x_data = data[0]
y_data = data[1]
# print(y_data)
predicts = mnist(x_data)
loss = F.cross_entropy(predicts, y_data)
# 计算损失
acc = paddle.metric.accuracy(predicts, y_data, k=2)
loss.backward()
if batch_id % 5 == 0:
print("epoch: {}, batch_id: {}, loss is: {}, acc is: {}".format(epoch, batch_id, loss.numpy(), acc.numpy()))
optim.step()
optim.clear_grad()
# 训练过程中,更新完参数后,同步update shadow weights
ema.update()
# eval前,apply shadow weights;eval之后,恢复原来模型的参数
ema.apply()
save_path = 'test.pdparams'
paddle.save(mnist.state_dict(), save_path)
模型参数加载&验证
import paddle
from paddle.vision.models import LeNet
import paddle.nn as nn
mnist = LeNet()
# 加载模型
load_layer_state_dict = paddle.load("test.pdparams")
mnist.set_state_dict(load_layer_state_dict)
import paddle.nn.functional as F
# 预测一个batch的数据
for batch_id, data in enumerate(test_loader()):
x_data = data[0]
y_data = data[1]
predicts = mnist(x_data)
loss = F.cross_entropy(predicts, y_data)
# 计算损失
acc = paddle.metric.accuracy(predicts, y_data, k=2)
if batch_id % 5 == 0:
print("batch_id: {}, loss is: {}, acc is: {}".format(batch_id, loss.numpy(), acc.numpy()))
batch_id: 0, loss is: [0.00824078], acc is: [1.]
batch_id: 5, loss is: [0.02751699], acc is: [1.]
batch_id: 10, loss is: [0.04925193], acc is: [0.984375]
batch_id: 15, loss is: [0.06211995], acc is: [0.984375]
batch_id: 20, loss is: [0.10744438], acc is: [0.984375]
batch_id: 25, loss is: [0.03402979], acc is: [1.]
batch_id: 30, loss is: [0.01336599], acc is: [1.]
batch_id: 35, loss is: [0.01195791], acc is: [1.]
batch_id: 40, loss is: [0.03601253], acc is: [1.]
batch_id: 45, loss is: [0.02507739], acc is: [1.]
batch_id: 50, loss is: [0.02883011], acc is: [1.]
batch_id: 55, loss is: [0.00986674], acc is: [1.]
batch_id: 60, loss is: [0.12979744], acc is: [1.]
batch_id: 65, loss is: [0.01078999], acc is: [1.]
batch_id: 70, loss is: [0.09174839], acc is: [0.984375]
batch_id: 75, loss is: [0.0538558], acc is: [1.]
batch_id: 80, loss is: [0.00910719], acc is: [1.]
batch_id: 85, loss is: [0.01266371], acc is: [1.]
batch_id: 90, loss is: [0.04465353], acc is: [1.]
batch_id: 95, loss is: [0.16671169], acc is: [0.984375]
batch_id: 100, loss is: [0.04313185], acc is: [1.]
batch_id: 105, loss is: [0.03688852], acc is: [1.]
batch_id: 110, loss is: [0.03034054], acc is: [1.]
batch_id: 115, loss is: [0.08325765], acc is: [1.]
batch_id: 120, loss is: [0.01138867], acc is: [1.]
batch_id: 125, loss is: [0.0183015], acc is: [1.]
batch_id: 130, loss is: [0.07376026], acc is: [1.]
batch_id: 135, loss is: [0.003135], acc is: [1.]
batch_id: 140, loss is: [0.0204536], acc is: [1.]
batch_id: 145, loss is: [0.00931214], acc is: [1.]
batch_id: 150, loss is: [0.04892552], acc is: [1.]
batch_id: 155, loss is: [0.02909346], acc is: [1.]
总结
本次教程为大家讲解了飞桨2.0框架下如何实现EMA。EMA是一种非常实用的涨点技巧,它不会给模型带来参数的增加,但是能够提高模型的鲁棒性。本教程中未设置对比实验,其重点在于EMA的实现,并且此方法的有效性已经在很多论文中得到了验证,故不再需要进行重复工作。对于本实验中使用的数据以及模型,其是否使用EMA对于模型最终的表现可能看不出来,所以大家如果想要验证
- 点赞
- 收藏
- 关注作者
评论(0)