梯度下降分析实验
梯度下降法分析
1 实验简介
1.采用相同的分类算法,如神经网络,设计实现梯度下降法;
2.与现有的梯度下降法,如SGD、Adam等进行实验比较与分析;
3.总结现在梯度下降法的优、缺点;
2 实验目的
- 掌握梯度下降算法
3 相关理论与知识点
(1)梯度下降法原理
4 实验任务及评分标准
序号 |
任务名称 |
任务具体要求 |
评分标准(100分制) |
1 |
采用相同的分类算法,如神经网络,设计实现梯度下降法 与现有的梯度下降法,如SGD、Adam等进行实验比较与分析; 总结现在梯度下降法的优、缺点 |
采用相同的分类算法,如神经网络,设计实现梯度下降法 使用Mindspore框架完成,采用Jupyter写代码。 开发语言:Python。
|
提交要求:mindspore框架、Jupyter完成;代码提交到华为云平台,同时将代码与实现报告打包提交到乐学平台,命名:学号+姓名+梯度下降法分析。 |
5 实验条件与环境
要求 |
名称 |
版本要求 |
备注 |
编程语言 |
Python |
|
|
开发环境 |
Jupyter |
|
|
第三方工具包/库/插件 |
Mindspore |
|
|
6 实验步骤及其代码
步骤序号 |
1 |
步骤名称 |
定义超参数 |
步骤描述 |
设置训练批大小、学习率和训练轮数 |
代码及讲解 |
batch_size = 64 learning_rate = 0.01 num_epoches = 5 |
输出结果及其解读 |
|
步骤序号 |
2 |
步骤名称 |
导入和加载数据集 |
步骤描述 |
下载MNIST数据集,数据下载完成后,获得数据集对象。MindSpore的dataset使用数据处理流水线(Data Processing Pipeline),需指定map、batch、shuffle等操作。这里我们使用map对图像数据及标签进行变换处理,然后将处理好的数据集打包为大小为设置的batch。 |
代码及讲解 |
import mindspore from download import download from mindspore.dataset import vision, transforms from mindspore.dataset import MnistDataset
url = "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/" \ "notebook/datasets/MNIST_Data.zip" path = download(url, "./", kind="zip", replace=True)
dataset_train_for_SGD = MnistDataset('MNIST_Data/train') dataset_test_for_SGD = MnistDataset('MNIST_Data/test') dataset_train_for_Adam = MnistDataset('MNIST_Data/train') dataset_test_for_Adam = MnistDataset('MNIST_Data/test') dataset_train_for_selfGD = MnistDataset('MNIST_Data/train') dataset_test_for_selfGD = MnistDataset('MNIST_Data/test')
def datapipe(dataset, batch_size): image_transforms = [ vision.Rescale(1.0 / 255.0, 0), vision.Normalize(mean=(0.1307,), std=(0.3081,)), vision.HWC2CHW() ] label_transform = transforms.TypeCast(mindspore.int32)
dataset = dataset.map(image_transforms, 'image') dataset = dataset.map(label_transform, 'label') dataset = dataset.batch(batch_size) return dataset
dataset_train_for_SGD = datapipe(dataset_train_for_SGD, batch_size) dataset_test_for_SGD = datapipe(dataset_test_for_SGD, batch_size) dataset_train_for_Adam = datapipe(dataset_train_for_Adam, batch_size) dataset_test_for_Adam = datapipe(dataset_test_for_Adam, batch_size) dataset_train_for_selfGD = datapipe(dataset_train_for_selfGD, batch_size) dataset_test_for_selfGD = datapipe(dataset_test_for_selfGD, batch_size) |
输出结果及其解读 |
Downloading data from https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/MNIST_Data.zip (10.3 MB)
file_sizes: 100%|███████████████████████████| 10.8M/10.8M [00:00<00:00, 147MB/s] Extracting zip file... Successfully downloaded / unzipped to ./ |
步骤序号 |
3 |
步骤名称 |
定义模型 |
步骤描述 |
定义一个简单的MLP模型用于分类 |
代码及讲解 |
import mindspore.nn as nn
class Network(nn.Cell): def __init__(self): super().__init__() self.flatten = nn.Flatten() self.dense_relu_sequential = nn.SequentialCell( nn.Dense(28*28, 512), nn.ReLU(), nn.Dense(512, 512), nn.ReLU(), nn.Dense(512, 10) )
def construct(self, x): x = self.flatten(x) logits = self.dense_relu_sequential(x) return logits
network = Network() |
输出结果及其解读 |
|
步骤序号 |
4 |
步骤名称 |
定义损失函数 |
步骤描述 |
定义交叉熵损失函数 |
代码及讲解 |
net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean') |
输出结果及其解读 |
|
步骤序号 |
5 |
步骤名称 |
定义自己的梯度下降优化器 |
步骤描述 |
用最简单的梯度下降算法创建一个优化器子类 |
代码及讲解 |
from mindspore import ops
class selfGD(nn.Optimizer): def __init__(self, params, learning_rate): super(selfGD, self).__init__(learning_rate, params)
def construct(self, gradients): lr = self.get_lr() params = self.parameters
for i in range(len(params)): update = params[i] - gradients[i] * lr ops.assign(params[i], update) return params |
输出结果及其解读 |
|
步骤序号 |
6 |
步骤名称 |
定义优化器函数 |
步骤描述 |
分别定义SGD、Adam和自己实现的selfGD优化器 |
代码及讲解 |
net_opt_SGD = nn.SGD(network.trainable_params(), learning_rate=learning_rate) net_opt_Adam = nn.Adam(network.trainable_params(), learning_rate=learning_rate) net_opt_selfGD = selfGD(network.trainable_params(), learning_rate=learning_rate) |
输出结果及其解读 |
|
步骤序号 |
7 |
步骤名称 |
训练模型 |
步骤描述 |
分别用三种优化器对同一个模型进行训练 |
代码及讲解 |
from mindvision.engine.callback import LossMonitor from mindspore.train import Model
model_for_SGD=Model(network, loss_fn=net_loss, optimizer=net_opt_SGD, metrics={'accuracy'}) model_for_Adam=Model(network, loss_fn=net_loss, optimizer=net_opt_Adam, metrics={'accuracy'}) model_for_selfGD=Model(network, loss_fn=net_loss, optimizer=net_opt_selfGD, metrics={'accuracy'})
print('SGD training...') model_for_SGD.train(num_epoches, dataset_train_for_SGD, callbacks=[LossMonitor(learning_rate, 300)]) print('Adam training...') model_for_Adam.train(num_epoches, dataset_train_for_Adam, callbacks=[LossMonitor(learning_rate, 300)]) print('selfGD training...') model_for_selfGD.train(num_epoches, dataset_train_for_selfGD, callbacks=[LossMonitor(learning_rate, 300)]) |
输出结果及其解读 |
SGD training... Epoch:[ 0/ 5], step:[ 300/ 938], loss:[2.205/2.270], time:9.161 ms, lr:0.01000 Epoch:[ 0/ 5], step:[ 600/ 938], loss:[1.076/2.001], time:1.211 ms, lr:0.01000 Epoch:[ 0/ 5], step:[ 900/ 938], loss:[0.489/1.590], time:1.346 ms, lr:0.01000 Epoch time: 21175.992 ms, per step time: 22.576 ms, avg loss: 1.548 Epoch:[ 1/ 5], step:[ 262/ 938], loss:[0.361/0.495], time:1.345 ms, lr:0.01000 Epoch:[ 1/ 5], step:[ 562/ 938], loss:[0.473/0.454], time:79.919 ms, lr:0.01000 Epoch:[ 1/ 5], step:[ 862/ 938], loss:[0.361/0.426], time:1.656 ms, lr:0.01000 Epoch time: 18970.728 ms, per step time: 20.225 ms, avg loss: 0.421 Epoch:[ 2/ 5], step:[ 224/ 938], loss:[0.306/0.337], time:1.547 ms, lr:0.01000 Epoch:[ 2/ 5], step:[ 524/ 938], loss:[0.433/0.333], time:86.639 ms, lr:0.01000 Epoch:[ 2/ 5], step:[ 824/ 938], loss:[0.200/0.326], time:93.757 ms, lr:0.01000 Epoch time: 20900.811 ms, per step time: 22.282 ms, avg loss: 0.322 Epoch:[ 3/ 5], step:[ 186/ 938], loss:[0.207/0.278], time:1.352 ms, lr:0.01000 Epoch:[ 3/ 5], step:[ 486/ 938], loss:[0.317/0.282], time:1.412 ms, lr:0.01000 Epoch:[ 3/ 5], step:[ 786/ 938], loss:[0.656/0.272], time:1.775 ms, lr:0.01000 Epoch time: 20902.115 ms, per step time: 22.284 ms, avg loss: 0.271 Epoch:[ 4/ 5], step:[ 148/ 938], loss:[0.076/0.251], time:85.664 ms, lr:0.01000 Epoch:[ 4/ 5], step:[ 448/ 938], loss:[0.119/0.246], time:1.291 ms, lr:0.01000 Epoch:[ 4/ 5], step:[ 748/ 938], loss:[0.243/0.235], time:1.466 ms, lr:0.01000 Epoch time: 18496.216 ms, per step time: 19.719 ms, avg loss: 0.232 Adam training... Epoch:[ 0/ 5], step:[ 300/ 938], loss:[0.276/0.669], time:1.306 ms, lr:0.01000 Epoch:[ 0/ 5], step:[ 600/ 938], loss:[0.386/0.475], time:93.253 ms, lr:0.01000 Epoch:[ 0/ 5], step:[ 900/ 938], loss:[0.207/0.405], time:1.714 ms, lr:0.01000 Epoch time: 23533.298 ms, per step time: 25.089 ms, avg loss: 0.400 Epoch:[ 1/ 5], step:[ 262/ 938], loss:[0.354/0.228], time:1.341 ms, lr:0.01000 Epoch:[ 1/ 5], step:[ 562/ 938], loss:[0.258/0.230], time:93.148 ms, lr:0.01000 Epoch:[ 1/ 5], step:[ 862/ 938], loss:[0.098/0.226], time:1.542 ms, lr:0.01000 Epoch time: 22967.152 ms, per step time: 24.485 ms, avg loss: 0.226 Epoch:[ 2/ 5], step:[ 224/ 938], loss:[0.438/0.214], time:94.888 ms, lr:0.01000 Epoch:[ 2/ 5], step:[ 524/ 938], loss:[0.180/0.207], time:1.529 ms, lr:0.01000 Epoch:[ 2/ 5], step:[ 824/ 938], loss:[0.196/0.206], time:1.335 ms, lr:0.01000 Epoch time: 22894.382 ms, per step time: 24.408 ms, avg loss: 0.205 Epoch:[ 3/ 5], step:[ 186/ 938], loss:[0.358/0.166], time:94.699 ms, lr:0.01000 Epoch:[ 3/ 5], step:[ 486/ 938], loss:[0.436/0.183], time:1.296 ms, lr:0.01000 Epoch:[ 3/ 5], step:[ 786/ 938], loss:[0.133/0.185], time:1.455 ms, lr:0.01000 Epoch time: 23103.684 ms, per step time: 24.631 ms, avg loss: 0.186 Epoch:[ 4/ 5], step:[ 148/ 938], loss:[0.344/0.164], time:1.226 ms, lr:0.01000 Epoch:[ 4/ 5], step:[ 448/ 938], loss:[0.186/0.176], time:1.553 ms, lr:0.01000 Epoch:[ 4/ 5], step:[ 748/ 938], loss:[0.052/0.180], time:1.343 ms, lr:0.01000 Epoch time: 22493.676 ms, per step time: 23.980 ms, avg loss: 0.179 selfGD training... Epoch:[ 0/ 5], step:[ 300/ 938], loss:[0.064/0.132], time:1.610 ms, lr:0.01000 Epoch:[ 0/ 5], step:[ 600/ 938], loss:[0.055/0.124], time:1.645 ms, lr:0.01000 Epoch:[ 0/ 5], step:[ 900/ 938], loss:[0.140/0.123], time:1.344 ms, lr:0.01000 Epoch time: 21690.138 ms, per step time: 23.124 ms, avg loss: 0.123 Epoch:[ 1/ 5], step:[ 262/ 938], loss:[0.168/0.116], time:92.954 ms, lr:0.01000 Epoch:[ 1/ 5], step:[ 562/ 938], loss:[0.024/0.111], time:1.868 ms, lr:0.01000 Epoch:[ 1/ 5], step:[ 862/ 938], loss:[0.227/0.110], time:1.463 ms, lr:0.01000 Epoch time: 20572.306 ms, per step time: 21.932 ms, avg loss: 0.112 Epoch:[ 2/ 5], step:[ 224/ 938], loss:[0.038/0.111], time:1.790 ms, lr:0.01000 Epoch:[ 2/ 5], step:[ 524/ 938], loss:[0.073/0.108], time:1.709 ms, lr:0.01000 Epoch:[ 2/ 5], step:[ 824/ 938], loss:[0.087/0.106], time:1.405 ms, lr:0.01000 Epoch time: 20296.911 ms, per step time: 21.638 ms, avg loss: 0.106 Epoch:[ 3/ 5], step:[ 186/ 938], loss:[0.272/0.104], time:1.933 ms, lr:0.01000 Epoch:[ 3/ 5], step:[ 486/ 938], loss:[0.210/0.102], time:1.399 ms, lr:0.01000 Epoch:[ 3/ 5], step:[ 786/ 938], loss:[0.033/0.102], time:1.944 ms, lr:0.01000 Epoch time: 20298.339 ms, per step time: 21.640 ms, avg loss: 0.103 Epoch:[ 4/ 5], step:[ 148/ 938], loss:[0.034/0.104], time:1.447 ms, lr:0.01000 Epoch:[ 4/ 5], step:[ 448/ 938], loss:[0.122/0.100], time:1.431 ms, lr:0.01000 Epoch:[ 4/ 5], step:[ 748/ 938], loss:[0.017/0.100], time:1.207 ms, lr:0.01000 Epoch time: 20503.968 ms, per step time: 21.859 ms, avg loss: 0.100 |
步骤序号 |
8 |
步骤名称 |
输出测试集推理准确率 |
步骤描述 |
输出测试集推理准确率 |
代码及讲解 |
acc_SGD = model_for_SGD.eval(dataset_test_for_SGD) acc_Adam = model_for_Adam.eval(dataset_test_for_Adam) acc_selfGD = model_for_selfGD.eval(dataset_test_for_selfGD)
print("SGD:{}".format(acc_SGD)) print("Adam:{}".format(acc_Adam)) print("selfGD:{}".format(acc_selfGD)) |
输出结果及其解读 |
SGD:{'accuracy': 0.96} Adam:{'accuracy': 0.96} selfGD:{'accuracy': 0.96} |
7 实验结果及解读
从测试集的预测准确率上看,自己实现的梯度下降算法与SGD和Adam有着同样的效果。
从训练过程上看,SGD的平均每步训练时间为19.719 ms,平均loss为0.232;Adam的平均每步训练时间为23.980 ms,平均loss为0.179;自己实现的梯度下降算法的平均每步训练时间为21.859 ms,平均loss为0.1。
可以看出SGD训练较快,但平均loss较大,Adam训练较慢,但平均loss较小,普通梯度下降的loss很小,训练时间适中。
梯度下降优缺点:
优点:
1.梯度下降法可以解决函数无法通过数学推导求最优解的问题;
2.简单易行,上手容易
缺点:
1.步长较难选择。如果步长太小可能很慢才能得到最优解,如果步长太大,可能得不到最优解;
2.凸函数可能得到的只是近似最优解,非凸函数可能只能得到局部最优解;
- 点赞
- 收藏
- 关注作者
评论(0)