- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

2020人工神经网络第一次作业-参考答案第三部分

tsinghuazhuoqing 发表于 2021/12/25 23:28:49 2021/12/25

【摘要】本文是 2020人工神经网络第一次作业的参考答案第三部分   ➤03 第三题参考答案 1.构造BP网络进行函数逼近 (1) 逼近简单函数构建单隐层的神经网络，隐层节点个数20，传...

本文是 2020人工神经网络第一次作业 的参考答案第三部分

➤03 第三题参考答案

1.构造BP网络进行函数逼近

(1) 逼近简单函数

构建单隐层的神经网络，隐层节点个数20，传递函数为sigmoid函数，输出神经元的传递函数为线性函数。

^{▲ 神经网络结构}

直接在（0 ~ 1）之间均匀采样50个样本，使用最基本的BP算法训练上述网络。

^{▲ 逼近函数}

随着训练次数的增加，网络的输入输出关系变化如下：

^{▲ 训练过程中网络对应函数的变化}

下面是网络误差收敛曲线：

^{▲ 网络误差收敛曲线}

(2) 将数据进行预处理

将输入x转换到（-0.5, 0.5）之间
将网络输出转换到（-0.8,0.8）之间
^{▲ 网络训练过程对应函数变化}

^{▲ 网络训练误差收敛曲线}

可以看到对于数据进行预处理之后，将样本的输入、输出都转换到关于0对称，网络收敛的速度和精度都得到了提高。

(3) 修改网络隐层传递函数

将网络隐层传递函数修改为双曲正切函数。修改学习速率 $\eta = 0.25$ 。可以看到网络收敛速度又再一次增加了。

^{▲ 随着训练网络输入输出之间的关系}

^{▲ 网络训练误差收敛曲线}

(4) 训练四个周期的函数

学习速率 $\eta = 0.5$

^{▲ 逼近函数}

^{▲ 函数逼近效果}

^{▲ 网络训练误差收敛曲线}

(5) 训练六个周期的函数

使用隐层节点为20的神经网络，逼近六个周期的sin函数，单层网络训练不收敛。

^{▲ 逼近六个周期的函数}

随着训练过程增加，网络的输入输出函数稳定在一个中间值的状态。

^{▲ 网络输入输出函数变化情况}

^{▲ 网络误差收敛曲线}

#!/usr/local/bin/python
# -*- coding: gbk -*-
#============================================================
# HW13BP.PY                    -- by Dr. ZhuoQing 2020-11-17
#
# Note:
#============================================================

from headm import *

#------------------------------------------------------------
# Samples data construction

x_data = linspace(-0.5, 0.5, 50).reshape(-1, 1)
y_data = (sin(x_data*6*pi)*0.8).reshape(1,-1)

#------------------------------------------------------------

xx = linspace(-0.5,0.5, 500)
yy = sin(xx*6*pi) * 0.8
plt.plot(xx, yy, label='sin(2pix)')
plt.scatter(x_data.reshape(-1,1), y_data.reshape(-1,1), color='r')
plt.xlabel("x")
plt.ylabel("f(x)")
plt.grid(True)
plt.tight_layout()
plt.show()

#------------------------------------------------------------
def shuffledata(X, Y):
    id = list(range(X.shape[0]))
    random.shuffle(id)
    return X[id], (Y.T[id]).T

#------------------------------------------------------------
# Define and initialization NN
def initialize_parameters(n_x, n_h, n_y):
    random.seed(2)

    W1 = random.randn(n_h, n_x) * 0.5          # dot(W1,X.T)
    W2 = random.randn(n_y, n_h) * 0.5          # dot(W2,Z1)
    b1 = zeros((n_h, 1))                       # Column vector
    b2 = zeros((n_y, 1))                       # Column vector

    parameters = {'W1':W1,
                  'b1':b1,
                  'W2':W2,
                  'b2':b2}

    return parameters

#------------------------------------------------------------
# Forward propagattion
# X:row->sample;
# Z2:col->sample
def forward_propagate(X, parameters):
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']

    Z1 = dot(W1, X.T) + b1                    # X:row-->sample; Z1:col-->sample
#    A1 = 1/(1+exp(-Z1))
    A1 = (1-exp(-Z1))/(1+exp(-Z1))

    Z2 = dot(W2, A1) + b2                     # Z2:col-->sample
#    A2 = 1/(1+exp(-Z2))                       # A:col-->sample
    A2 = Z2                                   # Linear output

    cache = {'Z1':Z1,
             'A1':A1,
             'Z2':Z2,
             'A2':A2}
    return Z2, cache

#------------------------------------------------------------
# Calculate the cost
# A2,Y: col->sample
def calculate_cost(A2, Y, parameters):
    err = [x1-x2 for x1,x2 in zip(A2.T, Y.T)]
    cost = [dot(e,e) for e in err]
    return mean(cost)

#------------------------------------------------------------
# Backward propagattion
def backward_propagate(parameters, cache, X, Y):
    m = X.shape[0]                  # Number of the samples

    W1 = parameters['W1']
    W2 = parameters['W2']
    A1 = cache['A1']
    A2 = cache['A2']

    dZ2 = (A2 - Y) #* (A2 * (1-A2))
    dW2 = dot(dZ2, A1.T) / m
    db2 = sum(dZ2, axis=1, keepdims=True) / m

#    dZ1 = dot(W2.T, dZ2) * (A1 * (1-A1))
    dZ1 = dot(W2.T, dZ2) * (1-A1**2)
    dW1 = dot(dZ1, X) / m
    db1 = sum(dZ1, axis=1, keepdims=True) / m

    grads = {'dW1':dW1,
             'db1':db1,
             'dW2':dW2,
             'db2':db2}

    return grads

#------------------------------------------------------------
# Update the parameters
def update_parameters(parameters, grads, learning_rate):
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']

    dW1 = grads['dW1']
    db1 = grads['db1']
    dW2 = grads['dW2']
    db2 = grads['db2']

    W1 = W1 - learning_rate * dW1
    W2 = W2 - learning_rate * dW2
    b1 = b1 - learning_rate * db1
    b2 = b2 - learning_rate * db2

    parameters = {'W1':W1,
                  'b1':b1,
                  'W2':W2,
                  'b2':b2}

    return parameters

#------------------------------------------------------------
# Define the training
DISP_STEP           = 2000

#------------------------------------------------------------
pltgif = PlotGIF()

#------------------------------------------------------------
def train(X, Y, num_iterations, learning_rate, print_cost=False):
#    random.seed(3)

    n_x = 1
    n_y = 1
    n_h = 20

    lr = learning_rate

    parameters = initialize_parameters(n_x, n_h, n_y)
    XX,YY = shuffledata(X, Y)

    costdim = []
    x = linspace(-0.5, 0.5, 250).reshape(-1,1)

    for i in range(0, num_iterations):
        A2, cache = forward_propagate(XX, parameters)
        cost = calculate_cost(A2, YY, parameters)
        grads = backward_propagate(parameters, cache, XX, YY)
        parameters = update_parameters(parameters, grads, lr)

        if print_cost and i % DISP_STEP == 0:
            printf('Cost after iteration:%i: %f'%(i, cost))
            costdim.append(cost)

            plt.clf()
            y,cache = forward_propagate(x, parameters)
            plt.plot(x.reshape(-1,1), y.reshape(-1,1))
            plt.xlabel("x")
            plt.ylabel("f(x)")
            plt.grid(True)
            plt.tight_layout()
            plt.draw()
            plt.pause(.1)
            pltgif.append(plt)

            if cost < 0.0001:
                break

            XX,YY = shuffledata(X, Y)

    return parameters, costdim

#------------------------------------------------------------
parameter,costdim = train(x_data, y_data, 200000, 0.5, True)
pltgif.save(r'd:\temp\1.gif')

#------------------------------------------------------------
plt.clf()
plt.plot(arange(len(costdim))*DISP_STEP, costdim)
plt.xlabel("Step(10)")
plt.ylabel("Cost")
plt.grid(True)
plt.tight_layout()
plt.show()

#------------------------------------------------------------
#        END OF FILE : HW13BP.PY
#============================================================

  
 
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21
  22
  23
  24
  25
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39
  40
  41
  42
  43
  44
  45
  46
  47
  48
  49
  50
  51
  52
  53
  54
  55
  56
  57
  58
  59
  60
  61
  62
  63
  64
  65
  66
  67
  68
  69
  70
  71
  72
  73
  74
  75
  76
  77
  78
  79
  80
  81
  82
  83
  84
  85
  86
  87
  88
  89
  90
  91
  92
  93
  94
  95
  96
  97
  98
  99
  100
  101
  102
  103
  104
  105
  106
  107
  108
  109
  110
  111
  112
  113
  114
  115
  116
  117
  118
  119
  120
  121
  122
  123
  124
  125
  126
  127
  128
  129
  130
  131
  132
  133
  134
  135
  136
  137
  138
  139
  140
  141
  142
  143
  144
  145
  146
  147
  148
  149
  150
  151
  152
  153
  154
  155
  156
  157
  158
  159
  160
  161
  162
  163
  164
  165
  166
  167
  168
  169
  170
  171
  172
  173
  174
  175
  176
  177
  178
  179
  180
  181
  182
  183
  184
  185
  186
  187
  188
  189
  190
  191
  192
  193
  194
  195
  196
  197
  198
  199
  200
  201

2.将BP网络隐层传递函数修改成ReLU

将上述神经网络中间隐层的传递函数修改成ReLU函数。

修改程序部分：
对于 forward_propagate()中，A1的取值过程为：

A1 = Z1
A1[A1< 0]=0

  
 
  1
  2

对于backward_propaget()部分：

A11 = A1
A11[A11>0] = 1
A11[A11<0] = 0
dZ1 = dot(W2.T, dZ2) * A11

  
 
  1
  2
  3
  4

^{▲ 网络输入输出函数关系演变过程}

^{▲ 逼近单个sin函数式，网络训练误差变化曲线}

从前面的逼近结果来看，使用ReLU函数，在相同的隐层节点的情况下，函数误差偏大。

为了改善函数逼近的效果，将神经网络中间的神经元增加到50个，此时函数比较的效果如下：

^{▲ 隐层节点个数为50个时网络收敛情况}

^{▲ 网络误差收敛情况}

下面是处逼近四个周期的sin函数的过程。

^{▲ 逼近四个周期sin函数的过程}

^{▲ 网络训练误差收敛情况}

下面是逼近六个周期sin函数的网络收敛过程。

^{▲ 逼近六个周期sin信号网络输入输出关系变化情况}

^{▲ 逼近六个周期sin函数的网络误差收敛情况}

下面将神经网络中的隐层节点的个数提高到100个，训练样本的个数增加到250个。仍然使用传递函数ReLU 来逼近六个周期的sin函数。函数的收敛情况如下：

下面是网络输入输出之间的函数随着训练次数的增加出现的变化：

^{▲ 网络输入输出之间的函数关系的变化情况}

下面是网络训练误差随着训练次数瘦脸的情况：

^{▲ 逼近六个周期sin网络误差变化}

#!/usr/local/bin/python
# -*- coding: gbk -*-
#============================================================
# HW13BPRELU.PY                    -- by Dr. ZhuoQing 2020-11-17
#
# Note:
#============================================================

from headm import *

#------------------------------------------------------------
# Samples data construction

x_data = linspace(-0.5, 0.5, 250).reshape(-1, 1)
y_data = (sin(x_data*4*pi)*0.8).reshape(1,-1)

#------------------------------------------------------------
'''
xx = linspace(-0.5,0.5, 500)
yy = sin(xx*6*pi) * 0.8
plt.plot(xx, yy, label='sin(2pix)')
plt.scatter(x_data.reshape(-1,1), y_data.reshape(-1,1), color='r')
plt.xlabel("x")
plt.ylabel("f(x)")
plt.grid(True)
plt.tight_layout()
plt.show()

'''
#------------------------------------------------------------
def shuffledata(X, Y):
    id = list(range(X.shape[0]))
    random.shuffle(id)
    return X[id], (Y.T[id]).T

#------------------------------------------------------------
# Define and initialization NN
def initialize_parameters(n_x, n_h, n_y):
    random.seed(2)

    W1 = random.randn(n_h, n_x) * 0.5          # dot(W1,X.T)
    W2 = random.randn(n_y, n_h) * 0.5          # dot(W2,Z1)
    b1 = zeros((n_h, 1))                       # Column vector
    b2 = zeros((n_y, 1))                       # Column vector

    parameters = {'W1':W1,
                  'b1':b1,
                  'W2':W2,
                  'b2':b2}

    return parameters

#------------------------------------------------------------
# Forward propagattion
# X:row->sample;
# Z2:col->sample
def forward_propagate(X, parameters):
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']

    Z1 = dot(W1, X.T) + b1                    # X:row-->sample; Z1:col-->sample
#    A1 = 1/(1+exp(-Z1))
#    A1 = (1-exp(-Z1))/(1+exp(-Z1))
#    A1 = array([[x if x > 0 else 0 for x in l] for l in Z1])
    A1 = Z1
    A1[A1< 0]=0

    Z2 = dot(W2, A1) + b2                     # Z2:col-->sample
#    A2 = 1/(1+exp(-Z2))                       # A:col-->sample
    A2 = Z2                                   # Linear output

    cache = {'Z1':Z1,
             'A1':A1,
             'Z2':Z2,
             'A2':A2}
    return Z2, cache

#------------------------------------------------------------
# Calculate the cost
# A2,Y: col->sample
def calculate_cost(A2, Y, parameters):
    err = [x1-x2 for x1,x2 in zip(A2.T, Y.T)]
    cost = [dot(e,e) for e in err]
    return mean(cost)

#------------------------------------------------------------
# Backward propagattion
def backward_propagate(parameters, cache, X, Y):
    m = X.shape[0]                  # Number of the samples

    W1 = parameters['W1']
    W2 = parameters['W2']
    A1 = cache['A1']
    A2 = cache['A2']

    dZ2 = (A2 - Y) #* (A2 * (1-A2))
    dW2 = dot(dZ2, A1.T) / m
    db2 = sum(dZ2, axis=1, keepdims=True) / m

#    dZ1 = dot(W2.T, dZ2) * (A1 * (1-A1))
#    dZ1 = dot(W2.T, dZ2) * (1-A1**2)

#    A11 = array([[1 if x >= 0 else 0 for x in l] for l in A1])
    A11 = A1
    A11[A11>0] = 1
    A11[A11<0] = 0
    dZ1 = dot(W2.T, dZ2) * A11

    dW1 = dot(dZ1, X) / m
    db1 = sum(dZ1, axis=1, keepdims=True) / m

    grads = {'dW1':dW1,
             'db1':db1,
             'dW2':dW2,
             'db2':db2}

    return grads

#------------------------------------------------------------
# Update the parameters
def update_parameters(parameters, grads, learning_rate):
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']

    dW1 = grads['dW1']
    db1 = grads['db1']
    dW2 = grads['dW2']
    db2 = grads['db2']

    W1 = W1 - learning_rate * dW1
    W2 = W2 - learning_rate * dW2
    b1 = b1 - learning_rate * db1
    b2 = b2 - learning_rate * db2

    parameters = {'W1':W1,
                  'b1':b1,
                  'W2':W2,
                  'b2':b2}

    return parameters

#------------------------------------------------------------
# Define the training
DISP_STEP           = 1000

#------------------------------------------------------------
pltgif = PlotGIF()

#------------------------------------------------------------
def train(X, Y, num_iterations, learning_rate, print_cost=False):
#    random.seed(3)

    n_x = 1
    n_y = 1
    n_h = 50

    lr = learning_rate

    parameters = initialize_parameters(n_x, n_h, n_y)
    XX,YY = shuffledata(X, Y)

    costdim = []
    x = linspace(-0.5, 0.5, 250).reshape(-1,1)

    for i in range(0, num_iterations):
        A2, cache = forward_propagate(XX, parameters)
        cost = calculate_cost(A2, YY, parameters)
        grads = backward_propagate(parameters, cache, XX, YY)
        parameters = update_parameters(parameters, grads, lr)

        if print_cost and i % DISP_STEP == 0:
            printf('Cost after iteration:%i: %f'%(i, cost))
            costdim.append(cost)

            plt.clf()
            y,cache = forward_propagate(x, parameters)
            plt.plot(x.reshape(-1,1), y.reshape(-1,1))
            plt.xlabel("x")
            plt.ylabel("f(x)")
            plt.grid(True)
            plt.tight_layout()
            plt.draw()
            plt.pause(.1)
            pltgif.append(plt)

            if cost < 0.0001:
                break

            XX,YY = shuffledata(X, Y)

    return parameters, costdim

#------------------------------------------------------------
parameter,costdim = train(x_data, y_data, 100000, 0.5, True)
pltgif.save(r'd:\temp\1.gif')

#------------------------------------------------------------
plt.clf()
plt.plot(arange(len(costdim))*DISP_STEP, costdim)
plt.xlabel("Step(10)")
plt.ylabel("Cost")
plt.grid(True)
plt.tight_layout()
plt.show()

#------------------------------------------------------------
#        END OF FILE : HW13BPRELU.PY
#============================================================

  
 
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21
  22
  23
  24
  25
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39
  40
  41
  42
  43
  44
  45
  46
  47
  48
  49
  50
  51
  52
  53
  54
  55
  56
  57
  58
  59
  60
  61
  62
  63
  64
  65
  66
  67
  68
  69
  70
  71
  72
  73
  74
  75
  76
  77
  78
  79
  80
  81
  82
  83
  84
  85
  86
  87
  88
  89
  90
  91
  92
  93
  94
  95
  96
  97
  98
  99
  100
  101
  102
  103
  104
  105
  106
  107
  108
  109
  110
  111
  112
  113
  114
  115
  116
  117
  118
  119
  120
  121
  122
  123
  124
  125
  126
  127
  128
  129
  130
  131
  132
  133
  134
  135
  136
  137
  138
  139
  140
  141
  142
  143
  144
  145
  146
  147
  148
  149
  150
  151
  152
  153
  154
  155
  156
  157
  158
  159
  160
  161
  162
  163
  164
  165
  166
  167
  168
  169
  170
  171
  172
  173
  174
  175
  176
  177
  178
  179
  180
  181
  182
  183
  184
  185
  186
  187
  188
  189
  190
  191
  192
  193
  194
  195
  196
  197
  198
  199
  200
  201
  202
  203
  204
  205
  206
  207
  208
  209
  210
  211
  212

文章来源: zhuoqing.blog.csdn.net，作者：卓晴，版权归原作者所有，如需转载，请联系作者。

原文链接：zhuoqing.blog.csdn.net/article/details/109757052

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

2020人工神经网络第一次作业-参考答案第三部分

➤03 第三题参考答案

1.构造BP网络进行函数逼近

(1) 逼近简单函数

(2) 将数据进行预处理

(3) 修改网络隐层传递函数

(4) 训练四个周期的函数

(5) 训练六个周期的函数

2.将BP网络隐层传递函数修改成ReLU

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

2020人工神经网络第一次作业-参考答案第三部分

➤03 第三题参考答案

1.构造BP网络进行函数逼近

(1) 逼近简单函数

(2) 将数据进行预处理

(3) 修改网络隐层传递函数

(4) 训练四个周期的函数

(5) 训练六个周期的函数

2.将BP网络隐层传递函数修改成ReLU

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

推荐阅读

相关产品