六种PyTorch学习率调整策略(含代码)

举报
AI浩 发表于 2022/01/18 06:39:42 2022/01/18
【摘要】 ​ 目录摘要1、等间隔调整学习率 StepLR参数设置举例运行结果注:2、按需调整学习率 MultiStepLR参数设置:举例:运行结果:3、指数衰减调整学习率 ExponentialLR参数设置:举例运行结果4、余弦退火调整学习率 CosineAnnealingLR参数设置举例运行结果5、自适应调整学习率 ReduceLROnPlateau参数设置举例运行结果6、自定义调整学习率 Lamb...

 目录

摘要

1、等间隔调整学习率 StepLR

参数设置

举例

运行结果

注:

2、按需调整学习率 MultiStepLR

参数设置:

举例:

运行结果:

3、指数衰减调整学习率 ExponentialLR

参数设置:

举例

运行结果

4、余弦退火调整学习率 CosineAnnealingLR

参数设置

举例

运行结果

5、自适应调整学习率 ReduceLROnPlateau

参数设置

举例

运行结果

6、自定义调整学习率 LambdaLR

参数设置

举例

运行结果

总结


摘要

   PyTorch学习率调整策略通过torch.optim.lr_scheduler接口实现。PyTorch提供的学习率调整策略分为三大类,分别是

  1. 有序调整:等间隔调整(Step),按需调整学习率(MultiStep),指数衰减调整(Exponential)和 余弦退火CosineAnnealing。
  2. 自适应调整:自适应调整学习率 ReduceLROnPlateau。
  3. 自定义调整:自定义调整学习率 LambdaLR。

1、等间隔调整学习率 StepLR

等间隔调整学习率,调整倍数为 gamma 倍,调整间隔为 step_size。间隔单位是step。需要注意的是, step 通常是指 epoch,不要弄成 iteration 了。

torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1)

参数设置

step_size(int)- 学习率下降间隔数,若为 30,则会在 30、 60、 90…个 step 时,将学习率调整为 lr*gamma。
gamma(float)- 学习率调整倍数,默认为 0.1 倍,即下降 10 倍。
last_epoch(int)- 上一个 epoch 数,这个变量用来指示学习率是否需要调整。当last_epoch 符合设定的间隔时,就会对学习率进行调整。当为-1 时,学习率设置为初始值。

举例

import torch
from torch import nn, optim
from torch.autograd import Variable
import numpy as np
import matplotlib.pyplot as plt

x_train = np.array([[3.3], [4.4], [5.5], [6.71], [6.93], [4.168],
                    [9.779], [6.182], [7.59], [2.167], [7.042],
                    [10.791], [5.313], [7.997], [3.1]], dtype=np.float32)

y_train = np.array([[1.7], [2.76], [2.09], [3.19], [1.694], [1.573],
                    [3.366], [2.596], [2.53], [1.221], [2.827],
                    [3.465], [1.65], [2.904], [1.3]], dtype=np.float32)
x_train = torch.from_numpy(x_train)
y_train = torch.from_numpy(y_train)


# Linear Regression Model
class LinearRegression(nn.Module):
    def __init__(self):
        super(LinearRegression, self).__init__()
        self.linear1 = nn.Linear(1, 5)  # input and output is 1 dimension
        self.linear2 = nn.Linear(5, 1)

    def forward(self, x):
        out = self.linear1(x)
        out = self.linear2(out)
        return out


model = LinearRegression()
print(model.linear1)
# 定义loss和优化函数
criterion = nn.MSELoss()
optimizer = optim.SGD(
    [{"params": model.linear1.parameters(), "lr": 0.01},
     {"params": model.linear2.parameters()}],
    lr=0.02)
step_schedule = optim.lr_scheduler.StepLR(step_size=20, gamma=0.9, optimizer=optimizer)
step_lr_list = []
loss_list = []
# 开始训练
num_epochs = 240
for epoch in range(num_epochs):
    inputs = Variable(x_train)
    target = Variable(y_train)
    # forward
    out = model(inputs)
    loss = criterion(out, target)
    # backward
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    step_schedule.step()
    step_lr_list.append(step_schedule.get_last_lr()[0])
    loss_list.append(loss.item())
plt.subplot(121)
plt.plot(range(len(loss_list)), loss_list, label="loss")
plt.legend()
plt.subplot(122)
plt.plot(range(len(step_lr_list)), step_lr_list, label="step_lr")
plt.legend()
plt.show()

运行结果

注:

学习率调整要放在optimizer更新之后。如果scheduler.step()放在optimizer.update()的前面,将会跳过学习率更新的第一个值。

2、按需调整学习率 MultiStepLR

按设定的间隔调整学习率。这个方法适合后期调试使用,观察 loss 曲线,通过参数milestones给定衰减的epoch列表,可以在指定的epoch时期进行衰减。

torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.1, last_epoch=-1)

参数设置:

milestones(list)- 一个 list,每一个元素代表何时调整学习率, list 元素必须是递增的。如 milestones=[30,80,120]
gamma(float)- 学习率调整倍数,默认为 0.1 倍,即下降 10 倍。

举例:

import torch
from torch import nn, optim
from torch.autograd import Variable
import numpy as np
import matplotlib.pyplot as plt

x_train = np.array([[3.3], [4.4], [5.5], [6.71], [6.93], [4.168],
                    [9.779], [6.182], [7.59], [2.167], [7.042],
                    [10.791], [5.313], [7.997], [3.1]], dtype=np.float32)

y_train = np.array([[1.7], [2.76], [2.09], [3.19], [1.694], [1.573],
                    [3.366], [2.596], [2.53], [1.221], [2.827],
                    [3.465], [1.65], [2.904], [1.3]], dtype=np.float32)
x_train = torch.from_numpy(x_train)
y_train = torch.from_numpy(y_train)


# Linear Regression Model
class LinearRegression(nn.Module):
    def __init__(self):
        super(LinearRegression, self).__init__()
        self.linear1 = nn.Linear(1, 5)  # input and output is 1 dimension
        self.linear2 = nn.Linear(5, 1)

    def forward(self, x):
        out = self.linear1(x)
        out = self.linear2(out)
        return out


model = LinearRegression()
print(model.linear1)
# 定义loss和优化函数
criterion = nn.MSELoss()
optimizer = optim.SGD(
    [{"params": model.linear1.parameters(), "lr": 0.01},
     {"params": model.linear2.parameters()}],
    lr=0.02)
multi_schedule = optim.lr_scheduler.MultiStepLR(optimizer=optimizer,milestones=[120,180])
multi_list = []
loss_list = []
# 开始训练
num_epochs = 240
for epoch in range(num_epochs):
    inputs = Variable(x_train)
    target = Variable(y_train)
    # forward
    out = model(inputs)
    loss = criterion(out, target)
    # backward
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    multi_schedule.step()
    multi_list.append(multi_schedule.get_last_lr()[0])
    loss_list.append(loss.item())
plt.subplot(121)
plt.plot(range(len(loss_list)), loss_list, label="loss")
plt.legend()
plt.subplot(122)
plt.plot(range(len(multi_list)), multi_list, label="step_lr")
plt.legend()
plt.show()

运行结果:

 

3、指数衰减调整学习率 ExponentialLR

按指数衰减调整学习率,调整公式:lr=lr*gamma^{^{e}},e代表epoch。

torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=-1)

参数设置:

gamma- 学习率调整倍数的底,指数为 epoch。

举例

import torch
from torch import nn, optim
from torch.autograd import Variable
import numpy as np
import matplotlib.pyplot as plt

x_train = np.array([[3.3], [4.4], [5.5], [6.71], [6.93], [4.168],
                    [9.779], [6.182], [7.59], [2.167], [7.042],
                    [10.791], [5.313], [7.997], [3.1]], dtype=np.float32)

y_train = np.array([[1.7], [2.76], [2.09], [3.19], [1.694], [1.573],
                    [3.366], [2.596], [2.53], [1.221], [2.827],
                    [3.465], [1.65], [2.904], [1.3]], dtype=np.float32)
x_train = torch.from_numpy(x_train)
y_train = torch.from_numpy(y_train)


# Linear Regression Model
class LinearRegression(nn.Module):
    def __init__(self):
        super(LinearRegression, self).__init__()
        self.linear1 = nn.Linear(1, 5)  # input and output is 1 dimension
        self.linear2 = nn.Linear(5, 1)

    def forward(self, x):
        out = self.linear1(x)
        out = self.linear2(out)
        return out


model = LinearRegression()
print(model.linear1)
# 定义loss和优化函数
criterion = nn.MSELoss()
optimizer = optim.SGD(
    [{"params": model.linear1.parameters(), "lr": 0.01},
     {"params": model.linear2.parameters()}],
    lr=0.02)
exponent_schedule = optim.lr_scheduler.ExponentialLR(optimizer=optimizer,gamma=0.9)
multi_list = []
loss_list = []
# 开始训练
num_epochs = 240
for epoch in range(num_epochs):
    inputs = Variable(x_train)
    target = Variable(y_train)
    # forward
    out = model(inputs)
    loss = criterion(out, target)
    # backward
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    exponent_schedule.step()
    multi_list.append(exponent_schedule.get_last_lr()[0])
    loss_list.append(loss.item())
plt.subplot(121)
plt.plot(range(len(loss_list)), loss_list, label="loss")
plt.legend()
plt.subplot(122)
plt.plot(range(len(multi_list)), multi_list, label="step_lr")
plt.legend()
plt.show()

运行结果


4、余弦退火调整学习率 CosineAnnealingLR

以余弦函数为周期,并在每个周期最大值时重新设置学习率。以初始学习率为最大学习率,以 2∗Tmax为周期,在一个周期内先下降,后上升。

torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1)

参数设置

T_max(int)- 一次学习率周期的迭代次数,即 T_max 个 epoch 之后重新设置学习率。
eta_min(float)- 最小学习率,即在一个周期中,学习率最小会下降到 eta_min,默认值为 0。

举例

import torch
from torch import nn, optim
from torch.autograd import Variable
import numpy as np
import matplotlib.pyplot as plt

x_train = np.array([[3.3], [4.4], [5.5], [6.71], [6.93], [4.168],
                    [9.779], [6.182], [7.59], [2.167], [7.042],
                    [10.791], [5.313], [7.997], [3.1]], dtype=np.float32)

y_train = np.array([[1.7], [2.76], [2.09], [3.19], [1.694], [1.573],
                    [3.366], [2.596], [2.53], [1.221], [2.827],
                    [3.465], [1.65], [2.904], [1.3]], dtype=np.float32)
x_train = torch.from_numpy(x_train)
y_train = torch.from_numpy(y_train)


# Linear Regression Model
class LinearRegression(nn.Module):
    def __init__(self):
        super(LinearRegression, self).__init__()
        self.linear1 = nn.Linear(1, 5)  # input and output is 1 dimension
        self.linear2 = nn.Linear(5, 1)

    def forward(self, x):
        out = self.linear1(x)
        out = self.linear2(out)
        return out


model = LinearRegression()
print(model.linear1)
# 定义loss和优化函数
criterion = nn.MSELoss()
optimizer = optim.SGD(
    [{"params": model.linear1.parameters(), "lr": 0.01},
     {"params": model.linear2.parameters()}],
    lr=0.02)
cosine_schedule = optim.lr_scheduler.CosineAnnealingLR(optimizer=optimizer,T_max=20,eta_min=0.0004)
multi_list = []
loss_list = []
# 开始训练
num_epochs = 240
for epoch in range(num_epochs):
    inputs = Variable(x_train)
    target = Variable(y_train)
    # forward
    out = model(inputs)
    loss = criterion(out, target)
    # backward
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    cosine_schedule.step()
    multi_list.append(cosine_schedule.get_last_lr()[0])
    loss_list.append(loss.item())
plt.subplot(121)
plt.plot(range(len(loss_list)), loss_list, label="loss")
plt.legend()
plt.subplot(122)
plt.plot(range(len(multi_list)), multi_list, label="cosine_lr")
plt.legend()
plt.show()

运行结果


5、自适应调整学习率 ReduceLROnPlateau

当某指标不再变化(下降或升高),调整学习率,这是非常实用的学习率调整策略。
例如,当验证集的 loss 不再下降时,进行学习率调整;或者监测验证集的 accuracy,当accuracy 不再上升时,则调整学习率。

torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, verbose=False, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)

参数设置

mode(str)- 模式选择,有 min 和 max 两种模式, min 表示当指标不再降低(如监测loss), max 表示当指标不再升高(如监测 accuracy)。
factor(float)- 学习率调整倍数(等同于其它方法的 gamma),即学习率更新为 lr = lr * factor
patience(int)- 忍受该指标多少个 step 不变化,当忍无可忍时,调整学习率。
verbose(bool)- 是否打印学习率信息, print(‘Epoch {:5d}: reducing learning rate of group {} to {:.4e}.’.format(epoch, i, new_lr))
threshold_mode(str)- 选择判断指标是否达最优的模式,有两种模式, rel 和 abs。
当 threshold_mode == rel,并且 mode == max 时, dynamic_threshold = best * ( 1 +threshold );
当 threshold_mode == rel,并且 mode == min 时, dynamic_threshold = best * ( 1 -threshold );
当 threshold_mode == abs,并且 mode== max 时, dynamic_threshold = best + threshold ;
当 threshold_mode == rel,并且 mode == max 时, dynamic_threshold = best - threshold;

threshold(float)- 配合 threshold_mode 使用。
cooldown(int)- “冷却时间“,当调整学习率之后,让学习率调整策略冷静一下,让模型再训练一段时间,再重启监测模式。
min_lr(float or list)- 学习率下限,可为 float,或者 list,当有多个参数组时,可用 list 进行设置。
eps(float)- 学习率衰减的最小值,当学习率变化小于 eps 时,则不调整学习率。

举例

import torch
from torch import nn, optim
from torch.autograd import Variable
import numpy as np
import matplotlib.pyplot as plt

x_train = np.array([[3.3], [4.4], [5.5], [6.71], [6.93], [4.168],
                    [9.779], [6.182], [7.59], [2.167], [7.042],
                    [10.791], [5.313], [7.997], [3.1]], dtype=np.float32)

y_train = np.array([[1.7], [2.76], [2.09], [3.19], [1.694], [1.573],
                    [3.366], [2.596], [2.53], [1.221], [2.827],
                    [3.465], [1.65], [2.904], [1.3]], dtype=np.float32)
x_train = torch.from_numpy(x_train)
y_train = torch.from_numpy(y_train)


# Linear Regression Model
class LinearRegression(nn.Module):
    def __init__(self):
        super(LinearRegression, self).__init__()
        self.linear1 = nn.Linear(1, 5)  # input and output is 1 dimension
        self.linear2 = nn.Linear(5, 1)

    def forward(self, x):
        out = self.linear1(x)
        out = self.linear2(out)
        return out


model = LinearRegression()
print(model.linear1)
# 定义loss和优化函数
criterion = nn.MSELoss()
optimizer = optim.SGD(
    [{"params": model.linear1.parameters(), "lr": 0.01},
     {"params": model.linear2.parameters()}],
    lr=0.02)
reduce_schedule = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10,
                 verbose=False, threshold=1e-4, threshold_mode='rel',
                 cooldown=0, min_lr=0, eps=1e-8)
multi_list = []
loss_list = []
# 开始训练
num_epochs = 240
for epoch in range(num_epochs):
    inputs = Variable(x_train)
    target = Variable(y_train)
    # forward
    out = model(inputs)
    loss = criterion(out, target)
    # backward
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    reduce_schedule.step(loss)
    multi_list.append(optimizer.param_groups[0]["lr"])
    loss_list.append(loss.item())
plt.subplot(121)
plt.plot(range(len(loss_list)), loss_list, label="loss")
plt.legend()
plt.subplot(122)
plt.plot(range(len(multi_list)), multi_list, label="reduce_lr")
plt.legend()
plt.show()

运行结果

 

6、自定义调整学习率 LambdaLR

为不同参数组设定不同学习率调整策略。将每一个参数组的学习率设置为初始学习率lr的某个函数倍.

torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=-1)

参数设置

lr_lambda(是一个函数,或者列表(list))--当是一个函数时,需要给其一个整数参数,使其计算出一个乘数因子,用于调整学习率,通常该输入参数是epoch数目或者是一组上面的函数组成的列表。

举例

import torch
from torch import nn, optim
from torch.autograd import Variable
import numpy as np
import matplotlib.pyplot as plt

x_train = np.array([[3.3], [4.4], [5.5], [6.71], [6.93], [4.168],
                    [9.779], [6.182], [7.59], [2.167], [7.042],
                    [10.791], [5.313], [7.997], [3.1]], dtype=np.float32)

y_train = np.array([[1.7], [2.76], [2.09], [3.19], [1.694], [1.573],
                    [3.366], [2.596], [2.53], [1.221], [2.827],
                    [3.465], [1.65], [2.904], [1.3]], dtype=np.float32)
x_train = torch.from_numpy(x_train)
y_train = torch.from_numpy(y_train)


# Linear Regression Model
class LinearRegression(nn.Module):
    def __init__(self):
        super(LinearRegression, self).__init__()
        self.linear1 = nn.Linear(1, 5)  # input and output is 1 dimension
        self.linear2 = nn.Linear(5, 1)

    def forward(self, x):
        out = self.linear1(x)
        out = self.linear2(out)
        return out

lambda1 = lambda epoch: epoch//20
lambda2 = lambda epoch: 0.95**epoch
model = LinearRegression()
print(model.linear1)
# 定义loss和优化函数
criterion = nn.MSELoss()
optimizer = optim.SGD(
    [{"params": model.linear1.parameters(), "lr": 0.01},
     {"params": model.linear2.parameters()}],
    lr=0.02)
lambda_schedule = optim.lr_scheduler.LambdaLR(optimizer=optimizer,lr_lambda=[lambda1,lambda2])
lambda1_list = []
lambda2_list = []
loss_list = []
# 开始训练
num_epochs = 240
for epoch in range(num_epochs):
    inputs = Variable(x_train)
    target = Variable(y_train)
    # forward
    out = model(inputs)
    loss = criterion(out, target)
    # backward
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    lambda_schedule.step()
    lambda1_list.append(optimizer.param_groups[0]["lr"])
    lambda2_list.append(optimizer.param_groups[1]["lr"])
    loss_list.append(loss.item())
plt.subplot(121)
plt.plot(range(len(loss_list)), loss_list, label="loss")
plt.legend()
plt.subplot(122)
plt.plot(range(len(lambda1_list)),lambda1_list,label="lambda1_lr")
plt.plot(range(len(lambda2_list)),lambda2_list,label="lambda2_lr")
plt.legend()
plt.show()

运行结果

 

总结

  介绍完上面的学习率设置后,你肯定会问哪个更管用呢? 其实这要根据实际情况选择不同的策略。一般来说等间隔调整学习率、自适应调整学习率、余弦退火调整学习率。祝各位Loss收敛!!!!

参考文章:

PyTorch学习之六个学习率调整策略_mingo_敏-CSDN博客_pytorch 学习率

详细理解pytorch的六种学习率_峰峰的猫的博客-CSDN博客_pytorch学习率

【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。