梯度下降分析实验

举报
bit_zhy 发表于 2023/06/29 02:21:34 2023/06/29
【摘要】 机器学习实验-梯度下降分析

梯度下降法分析

 

1        实验简介

1.采用相同的分类算法,如神经网络,设计实现梯度下降法;

2.与现有的梯度下降法,如SGDAdam等进行实验比较与分析;

3.总结现在梯度下降法的优、缺点;

2        实验目的

  • 掌握梯度下降算法

3        相关理论与知识点

1)梯度下降法原理

4        实验任务及评分标准

序号

任务名称

任务具体要求

评分标准(100分制)

1

采用相同的分类算法,如神经网络,设计实现梯度下降法

与现有的梯度下降法,如SGDAdam等进行实验比较与分析;

总结现在梯度下降法的优、缺点

采用相同的分类算法,如神经网络,设计实现梯度下降法

使用Mindspore框架完成,采用Jupyter写代码。

开发语言:Python

 

提交要求:mindspore框架、Jupyter完成;代码提交到华为云平台,同时将代码与实现报告打包提交到乐学平台,命名:学号+姓名+梯度下降法分析。

 

5        实验条件与环境

 

要求

名称

版本要求

备注

编程语言

Python

 

 

开发环境

Jupyter

 

 

第三方工具包//插件

Mindspore

 

 

 

6        实验步骤及其代码

步骤序号

1

步骤名称

定义超参数

步骤描述

设置训练批大小、学习率和训练轮数

代码及讲解

batch_size = 64

learning_rate = 0.01

num_epoches = 5

输出结果及其解读

 

 

步骤序号

2         

步骤名称

导入和加载数据集

步骤描述

下载MNIST数据集,数据下载完成后,获得数据集对象。MindSporedataset使用数据处理流水线(Data Processing Pipeline),需指定mapbatchshuffle等操作。这里我们使用map对图像数据及标签进行变换处理,然后将处理好的数据集打包为大小为设置的batch

代码及讲解

import mindspore

from download import download

from mindspore.dataset import vision, transforms

from mindspore.dataset import MnistDataset

 

url = "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/" \

      "notebook/datasets/MNIST_Data.zip"

path = download(url, "./", kind="zip", replace=True)

 

dataset_train_for_SGD = MnistDataset('MNIST_Data/train')

dataset_test_for_SGD = MnistDataset('MNIST_Data/test')

dataset_train_for_Adam = MnistDataset('MNIST_Data/train')

dataset_test_for_Adam = MnistDataset('MNIST_Data/test')

dataset_train_for_selfGD = MnistDataset('MNIST_Data/train')

dataset_test_for_selfGD = MnistDataset('MNIST_Data/test')

 

def datapipe(dataset, batch_size):

    image_transforms = [

        vision.Rescale(1.0 / 255.0, 0),

        vision.Normalize(mean=(0.1307,), std=(0.3081,)),

        vision.HWC2CHW()

    ]

    label_transform = transforms.TypeCast(mindspore.int32)

 

    dataset = dataset.map(image_transforms, 'image')

    dataset = dataset.map(label_transform, 'label')

    dataset = dataset.batch(batch_size)

    return dataset

 

dataset_train_for_SGD = datapipe(dataset_train_for_SGD, batch_size)

dataset_test_for_SGD = datapipe(dataset_test_for_SGD, batch_size)

dataset_train_for_Adam = datapipe(dataset_train_for_Adam, batch_size)

dataset_test_for_Adam = datapipe(dataset_test_for_Adam, batch_size)

dataset_train_for_selfGD = datapipe(dataset_train_for_selfGD, batch_size)

dataset_test_for_selfGD = datapipe(dataset_test_for_selfGD, batch_size)

输出结果及其解读

Downloading data from https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/MNIST_Data.zip (10.3 MB)

 

file_sizes: 100%|███████████████████████████| 10.8M/10.8M [00:00<00:00, 147MB/s]

Extracting zip file...

Successfully downloaded / unzipped to ./

 

步骤序号

3

步骤名称

定义模型

步骤描述

定义一个简单的MLP模型用于分类

代码及讲解

import mindspore.nn as nn

 

class Network(nn.Cell):

    def __init__(self):

        super().__init__()

        self.flatten = nn.Flatten()

        self.dense_relu_sequential = nn.SequentialCell(

            nn.Dense(28*28, 512),

            nn.ReLU(),

            nn.Dense(512, 512),

            nn.ReLU(),

            nn.Dense(512, 10)

        )

 

    def construct(self, x):

        x = self.flatten(x)

        logits = self.dense_relu_sequential(x)

        return logits

 

network = Network()

输出结果及其解读

 

 

步骤序号

4

步骤名称

定义损失函数

步骤描述

定义交叉熵损失函数

代码及讲解

net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')

输出结果及其解读

 

 

步骤序号

5

步骤名称

定义自己的梯度下降优化器

步骤描述

用最简单的梯度下降算法创建一个优化器子类

代码及讲解

from mindspore import ops

 

class selfGD(nn.Optimizer):

    def __init__(self, params, learning_rate):

        super(selfGD, self).__init__(learning_rate, params)

 

    def construct(self, gradients):

        lr = self.get_lr()

        params = self.parameters

 

        for i in range(len(params)):

            update = params[i] - gradients[i] * lr

            ops.assign(params[i], update)

        return params

输出结果及其解读

 

 

步骤序号

6

步骤名称

定义优化器函数

步骤描述

分别定义SGDAdam和自己实现的selfGD优化器

代码及讲解

net_opt_SGD = nn.SGD(network.trainable_params(), learning_rate=learning_rate)

net_opt_Adam = nn.Adam(network.trainable_params(), learning_rate=learning_rate)

net_opt_selfGD = selfGD(network.trainable_params(), learning_rate=learning_rate)

输出结果及其解读

 

 

步骤序号

7

步骤名称

训练模型

步骤描述

分别用三种优化器对同一个模型进行训练

代码及讲解

from mindvision.engine.callback import LossMonitor

from mindspore.train import Model

 

model_for_SGD=Model(network, loss_fn=net_loss, optimizer=net_opt_SGD, metrics={'accuracy'})

model_for_Adam=Model(network, loss_fn=net_loss, optimizer=net_opt_Adam, metrics={'accuracy'})

model_for_selfGD=Model(network, loss_fn=net_loss, optimizer=net_opt_selfGD, metrics={'accuracy'})

 

print('SGD training...')

model_for_SGD.train(num_epoches, dataset_train_for_SGD, callbacks=[LossMonitor(learning_rate, 300)])

print('Adam training...')

model_for_Adam.train(num_epoches, dataset_train_for_Adam, callbacks=[LossMonitor(learning_rate, 300)])

print('selfGD training...')

model_for_selfGD.train(num_epoches, dataset_train_for_selfGD, callbacks=[LossMonitor(learning_rate, 300)])

输出结果及其解读

SGD training...

Epoch:[  0/  5], step:[  300/  938], loss:[2.205/2.270], time:9.161 ms, lr:0.01000

Epoch:[  0/  5], step:[  600/  938], loss:[1.076/2.001], time:1.211 ms, lr:0.01000

Epoch:[  0/  5], step:[  900/  938], loss:[0.489/1.590], time:1.346 ms, lr:0.01000

Epoch time: 21175.992 ms, per step time: 22.576 ms, avg loss: 1.548

Epoch:[  1/  5], step:[  262/  938], loss:[0.361/0.495], time:1.345 ms, lr:0.01000

Epoch:[  1/  5], step:[  562/  938], loss:[0.473/0.454], time:79.919 ms, lr:0.01000

Epoch:[  1/  5], step:[  862/  938], loss:[0.361/0.426], time:1.656 ms, lr:0.01000

Epoch time: 18970.728 ms, per step time: 20.225 ms, avg loss: 0.421

Epoch:[  2/  5], step:[  224/  938], loss:[0.306/0.337], time:1.547 ms, lr:0.01000

Epoch:[  2/  5], step:[  524/  938], loss:[0.433/0.333], time:86.639 ms, lr:0.01000

Epoch:[  2/  5], step:[  824/  938], loss:[0.200/0.326], time:93.757 ms, lr:0.01000

Epoch time: 20900.811 ms, per step time: 22.282 ms, avg loss: 0.322

Epoch:[  3/  5], step:[  186/  938], loss:[0.207/0.278], time:1.352 ms, lr:0.01000

Epoch:[  3/  5], step:[  486/  938], loss:[0.317/0.282], time:1.412 ms, lr:0.01000

Epoch:[  3/  5], step:[  786/  938], loss:[0.656/0.272], time:1.775 ms, lr:0.01000

Epoch time: 20902.115 ms, per step time: 22.284 ms, avg loss: 0.271

Epoch:[  4/  5], step:[  148/  938], loss:[0.076/0.251], time:85.664 ms, lr:0.01000

Epoch:[  4/  5], step:[  448/  938], loss:[0.119/0.246], time:1.291 ms, lr:0.01000

Epoch:[  4/  5], step:[  748/  938], loss:[0.243/0.235], time:1.466 ms, lr:0.01000

Epoch time: 18496.216 ms, per step time: 19.719 ms, avg loss: 0.232

Adam training...

Epoch:[  0/  5], step:[  300/  938], loss:[0.276/0.669], time:1.306 ms, lr:0.01000

Epoch:[  0/  5], step:[  600/  938], loss:[0.386/0.475], time:93.253 ms, lr:0.01000

Epoch:[  0/  5], step:[  900/  938], loss:[0.207/0.405], time:1.714 ms, lr:0.01000

Epoch time: 23533.298 ms, per step time: 25.089 ms, avg loss: 0.400

Epoch:[  1/  5], step:[  262/  938], loss:[0.354/0.228], time:1.341 ms, lr:0.01000

Epoch:[  1/  5], step:[  562/  938], loss:[0.258/0.230], time:93.148 ms, lr:0.01000

Epoch:[  1/  5], step:[  862/  938], loss:[0.098/0.226], time:1.542 ms, lr:0.01000

Epoch time: 22967.152 ms, per step time: 24.485 ms, avg loss: 0.226

Epoch:[  2/  5], step:[  224/  938], loss:[0.438/0.214], time:94.888 ms, lr:0.01000

Epoch:[  2/  5], step:[  524/  938], loss:[0.180/0.207], time:1.529 ms, lr:0.01000

Epoch:[  2/  5], step:[  824/  938], loss:[0.196/0.206], time:1.335 ms, lr:0.01000

Epoch time: 22894.382 ms, per step time: 24.408 ms, avg loss: 0.205

Epoch:[  3/  5], step:[  186/  938], loss:[0.358/0.166], time:94.699 ms, lr:0.01000

Epoch:[  3/  5], step:[  486/  938], loss:[0.436/0.183], time:1.296 ms, lr:0.01000

Epoch:[  3/  5], step:[  786/  938], loss:[0.133/0.185], time:1.455 ms, lr:0.01000

Epoch time: 23103.684 ms, per step time: 24.631 ms, avg loss: 0.186

Epoch:[  4/  5], step:[  148/  938], loss:[0.344/0.164], time:1.226 ms, lr:0.01000

Epoch:[  4/  5], step:[  448/  938], loss:[0.186/0.176], time:1.553 ms, lr:0.01000

Epoch:[  4/  5], step:[  748/  938], loss:[0.052/0.180], time:1.343 ms, lr:0.01000

Epoch time: 22493.676 ms, per step time: 23.980 ms, avg loss: 0.179

selfGD training...

Epoch:[  0/  5], step:[  300/  938], loss:[0.064/0.132], time:1.610 ms, lr:0.01000

Epoch:[  0/  5], step:[  600/  938], loss:[0.055/0.124], time:1.645 ms, lr:0.01000

Epoch:[  0/  5], step:[  900/  938], loss:[0.140/0.123], time:1.344 ms, lr:0.01000

Epoch time: 21690.138 ms, per step time: 23.124 ms, avg loss: 0.123

Epoch:[  1/  5], step:[  262/  938], loss:[0.168/0.116], time:92.954 ms, lr:0.01000

Epoch:[  1/  5], step:[  562/  938], loss:[0.024/0.111], time:1.868 ms, lr:0.01000

Epoch:[  1/  5], step:[  862/  938], loss:[0.227/0.110], time:1.463 ms, lr:0.01000

Epoch time: 20572.306 ms, per step time: 21.932 ms, avg loss: 0.112

Epoch:[  2/  5], step:[  224/  938], loss:[0.038/0.111], time:1.790 ms, lr:0.01000

Epoch:[  2/  5], step:[  524/  938], loss:[0.073/0.108], time:1.709 ms, lr:0.01000

Epoch:[  2/  5], step:[  824/  938], loss:[0.087/0.106], time:1.405 ms, lr:0.01000

Epoch time: 20296.911 ms, per step time: 21.638 ms, avg loss: 0.106

Epoch:[  3/  5], step:[  186/  938], loss:[0.272/0.104], time:1.933 ms, lr:0.01000

Epoch:[  3/  5], step:[  486/  938], loss:[0.210/0.102], time:1.399 ms, lr:0.01000

Epoch:[  3/  5], step:[  786/  938], loss:[0.033/0.102], time:1.944 ms, lr:0.01000

Epoch time: 20298.339 ms, per step time: 21.640 ms, avg loss: 0.103

Epoch:[  4/  5], step:[  148/  938], loss:[0.034/0.104], time:1.447 ms, lr:0.01000

Epoch:[  4/  5], step:[  448/  938], loss:[0.122/0.100], time:1.431 ms, lr:0.01000

Epoch:[  4/  5], step:[  748/  938], loss:[0.017/0.100], time:1.207 ms, lr:0.01000

Epoch time: 20503.968 ms, per step time: 21.859 ms, avg loss: 0.100

 

步骤序号

8

步骤名称

输出测试集推理准确率

步骤描述

输出测试集推理准确率

代码及讲解

acc_SGD = model_for_SGD.eval(dataset_test_for_SGD)

acc_Adam = model_for_Adam.eval(dataset_test_for_Adam)

acc_selfGD = model_for_selfGD.eval(dataset_test_for_selfGD)

 

print("SGD:{}".format(acc_SGD))

print("Adam:{}".format(acc_Adam))

print("selfGD:{}".format(acc_selfGD))

输出结果及其解读

SGD:{'accuracy': 0.96}

Adam:{'accuracy': 0.96}

selfGD:{'accuracy': 0.96}

 

7        实验结果及解读

从测试集的预测准确率上看,自己实现的梯度下降算法与SGDAdam有着同样的效果。

从训练过程上看,SGD的平均每步训练时间为19.719 ms,平均loss0.232Adam的平均每步训练时间为23.980 ms,平均loss0.179;自己实现的梯度下降算法的平均每步训练时间为21.859 ms,平均loss0.1

可以看出SGD训练较快,但平均loss较大,Adam训练较慢,但平均loss较小,普通梯度下降的loss很小,训练时间适中。

梯度下降优缺点:

优点:

1.梯度下降法可以解决函数无法通过数学推导求最优解的问题;

2.简单易行,上手容易

缺点:

1.步长较难选择。如果步长太小可能很慢才能得到最优解,如果步长太大,可能得不到最优解;

2.凸函数可能得到的只是近似最优解,非凸函数可能只能得到局部最优解;

【版权声明】本文为华为云社区用户原创内容,未经允许不得转载,如需转载请自行联系原作者进行授权。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。