2020人工神经网络第一次作业-参考答案第三部分
本文是 2020人工神经网络第一次作业 的参考答案第三部分
➤03 第三题参考答案
1.构造BP网络进行函数逼近
(1) 逼近简单函数
构建单隐层的神经网络,隐层节点个数20,传递函数为sigmoid函数,输出神经元的传递函数为线性函数。
▲ 神经网络结构
直接在(0 ~ 1)之间均匀采样50个样本,使用最基本的BP算法训练上述网络。
▲ 逼近函数
随着训练次数的增加,网络的输入输出关系变化如下:
▲ 训练过程中网络对应函数的变化
下面是网络误差收敛曲线:
▲ 网络误差收敛曲线
(2) 将数据进行预处理
- 将输入x转换到(-0.5, 0.5)之间
- 将网络输出转换到(-0.8,0.8)之间
▲ 网络训练过程对应函数变化
▲ 网络训练误差收敛曲线
可以看到对于数据进行预处理之后,将样本的输入、输出都转换到关于0对称,网络收敛的速度和精度都得到了提高。
(3) 修改网络隐层传递函数
将网络隐层传递函数修改为双曲正切函数。修改学习速率 η = 0.25 \eta = 0.25 η=0.25。可以看到网络收敛速度又再一次增加了。
▲ 随着训练网络输入输出之间的关系
▲ 网络训练误差收敛曲线
(4) 训练四个周期的函数
学习速率 η = 0.5 \eta = 0.5 η=0.5
▲ 逼近函数
▲ 函数逼近效果
▲ 网络训练误差收敛曲线
(5) 训练六个周期的函数
使用隐层节点为20的神经网络,逼近六个周期的sin函数,单层网络训练不收敛。
▲ 逼近六个周期的函数
随着训练过程增加,网络的输入输出函数稳定在一个中间值的状态。
▲ 网络输入输出函数变化情况
▲ 网络误差收敛曲线
#!/usr/local/bin/python
# -*- coding: gbk -*-
#============================================================
# HW13BP.PY -- by Dr. ZhuoQing 2020-11-17
#
# Note:
#============================================================
from headm import *
#------------------------------------------------------------
# Samples data construction
x_data = linspace(-0.5, 0.5, 50).reshape(-1, 1)
y_data = (sin(x_data*6*pi)*0.8).reshape(1,-1)
#------------------------------------------------------------
xx = linspace(-0.5,0.5, 500)
yy = sin(xx*6*pi) * 0.8
plt.plot(xx, yy, label='sin(2pix)')
plt.scatter(x_data.reshape(-1,1), y_data.reshape(-1,1), color='r')
plt.xlabel("x")
plt.ylabel("f(x)")
plt.grid(True)
plt.tight_layout()
plt.show()
#------------------------------------------------------------
def shuffledata(X, Y):
id = list(range(X.shape[0]))
random.shuffle(id)
return X[id], (Y.T[id]).T
#------------------------------------------------------------
# Define and initialization NN
def initialize_parameters(n_x, n_h, n_y):
random.seed(2)
W1 = random.randn(n_h, n_x) * 0.5 # dot(W1,X.T)
W2 = random.randn(n_y, n_h) * 0.5 # dot(W2,Z1)
b1 = zeros((n_h, 1)) # Column vector
b2 = zeros((n_y, 1)) # Column vector
parameters = {'W1':W1,
'b1':b1,
'W2':W2,
'b2':b2}
return parameters
#------------------------------------------------------------
# Forward propagattion
# X:row->sample;
# Z2:col->sample
def forward_propagate(X, parameters):
W1 = parameters['W1']
b1 = parameters['b1']
W2 = parameters['W2']
b2 = parameters['b2']
Z1 = dot(W1, X.T) + b1 # X:row-->sample; Z1:col-->sample
# A1 = 1/(1+exp(-Z1))
A1 = (1-exp(-Z1))/(1+exp(-Z1))
Z2 = dot(W2, A1) + b2 # Z2:col-->sample
# A2 = 1/(1+exp(-Z2)) # A:col-->sample
A2 = Z2 # Linear output
cache = {'Z1':Z1,
'A1':A1,
'Z2':Z2,
'A2':A2}
return Z2, cache
#------------------------------------------------------------
# Calculate the cost
# A2,Y: col->sample
def calculate_cost(A2, Y, parameters):
err = [x1-x2 for x1,x2 in zip(A2.T, Y.T)]
cost = [dot(e,e) for e in err]
return mean(cost)
#------------------------------------------------------------
# Backward propagattion
def backward_propagate(parameters, cache, X, Y):
m = X.shape[0] # Number of the samples
W1 = parameters['W1']
W2 = parameters['W2']
A1 = cache['A1']
A2 = cache['A2']
dZ2 = (A2 - Y) #* (A2 * (1-A2))
dW2 = dot(dZ2, A1.T) / m
db2 = sum(dZ2, axis=1, keepdims=True) / m
# dZ1 = dot(W2.T, dZ2) * (A1 * (1-A1))
dZ1 = dot(W2.T, dZ2) * (1-A1**2)
dW1 = dot(dZ1, X) / m
db1 = sum(dZ1, axis=1, keepdims=True) / m
grads = {'dW1':dW1,
'db1':db1,
'dW2':dW2,
'db2':db2}
return grads
#------------------------------------------------------------
# Update the parameters
def update_parameters(parameters, grads, learning_rate):
W1 = parameters['W1']
b1 = parameters['b1']
W2 = parameters['W2']
b2 = parameters['b2']
dW1 = grads['dW1']
db1 = grads['db1']
dW2 = grads['dW2']
db2 = grads['db2']
W1 = W1 - learning_rate * dW1
W2 = W2 - learning_rate * dW2
b1 = b1 - learning_rate * db1
b2 = b2 - learning_rate * db2
parameters = {'W1':W1,
'b1':b1,
'W2':W2,
'b2':b2}
return parameters
#------------------------------------------------------------
# Define the training
DISP_STEP = 2000
#------------------------------------------------------------
pltgif = PlotGIF()
#------------------------------------------------------------
def train(X, Y, num_iterations, learning_rate, print_cost=False):
# random.seed(3)
n_x = 1
n_y = 1
n_h = 20
lr = learning_rate
parameters = initialize_parameters(n_x, n_h, n_y)
XX,YY = shuffledata(X, Y)
costdim = []
x = linspace(-0.5, 0.5, 250).reshape(-1,1)
for i in range(0, num_iterations):
A2, cache = forward_propagate(XX, parameters)
cost = calculate_cost(A2, YY, parameters)
grads = backward_propagate(parameters, cache, XX, YY)
parameters = update_parameters(parameters, grads, lr)
if print_cost and i % DISP_STEP == 0:
printf('Cost after iteration:%i: %f'%(i, cost))
costdim.append(cost)
plt.clf()
y,cache = forward_propagate(x, parameters)
plt.plot(x.reshape(-1,1), y.reshape(-1,1))
plt.xlabel("x")
plt.ylabel("f(x)")
plt.grid(True)
plt.tight_layout()
plt.draw()
plt.pause(.1)
pltgif.append(plt)
if cost < 0.0001:
break
XX,YY = shuffledata(X, Y)
return parameters, costdim
#------------------------------------------------------------
parameter,costdim = train(x_data, y_data, 200000, 0.5, True)
pltgif.save(r'd:\temp\1.gif')
#------------------------------------------------------------
plt.clf()
plt.plot(arange(len(costdim))*DISP_STEP, costdim)
plt.xlabel("Step(10)")
plt.ylabel("Cost")
plt.grid(True)
plt.tight_layout()
plt.show()
#------------------------------------------------------------
# END OF FILE : HW13BP.PY
#============================================================
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
2.将BP网络隐层传递函数修改成ReLU
将上述神经网络中间隐层的传递函数修改成ReLU函数。
修改程序部分:
对于 forward_propagate()中,A1的取值过程为:
A1 = Z1
A1[A1< 0]=0
- 1
- 2
对于backward_propaget()部分:
A11 = A1
A11[A11>0] = 1
A11[A11<0] = 0
dZ1 = dot(W2.T, dZ2) * A11
- 1
- 2
- 3
- 4
▲ 网络输入输出函数关系演变过程
▲ 逼近单个sin函数式,网络训练误差变化曲线
从前面的逼近结果来看,使用ReLU函数,在相同的隐层节点的情况下,函数误差偏大。
为了改善函数逼近的效果,将神经网络中间的神经元增加到50个,此时函数比较的效果如下:
▲ 隐层节点个数为50个时网络收敛情况
▲ 网络误差收敛情况
下面是处逼近四个周期的sin函数的过程。
▲ 逼近四个周期sin函数的过程
▲ 网络训练误差收敛情况
下面是逼近六个周期sin函数的网络收敛过程。
▲ 逼近六个周期sin信号网络输入输出关系变化情况
▲ 逼近六个周期sin函数的网络误差收敛情况
下面将神经网络中的隐层节点的个数提高到100个,训练样本的个数增加到250个。仍然使用传递函数ReLU 来逼近六个周期的sin函数。函数的收敛情况如下:
下面是网络输入输出之间的函数随着训练次数的增加出现的变化:
▲ 网络输入输出之间的函数关系的变化情况
下面是网络训练误差随着训练次数瘦脸的情况:
▲ 逼近六个周期sin网络误差变化
#!/usr/local/bin/python
# -*- coding: gbk -*-
#============================================================
# HW13BPRELU.PY -- by Dr. ZhuoQing 2020-11-17
#
# Note:
#============================================================
from headm import *
#------------------------------------------------------------
# Samples data construction
x_data = linspace(-0.5, 0.5, 250).reshape(-1, 1)
y_data = (sin(x_data*4*pi)*0.8).reshape(1,-1)
#------------------------------------------------------------
'''
xx = linspace(-0.5,0.5, 500)
yy = sin(xx*6*pi) * 0.8
plt.plot(xx, yy, label='sin(2pix)')
plt.scatter(x_data.reshape(-1,1), y_data.reshape(-1,1), color='r')
plt.xlabel("x")
plt.ylabel("f(x)")
plt.grid(True)
plt.tight_layout()
plt.show()
'''
#------------------------------------------------------------
def shuffledata(X, Y):
id = list(range(X.shape[0]))
random.shuffle(id)
return X[id], (Y.T[id]).T
#------------------------------------------------------------
# Define and initialization NN
def initialize_parameters(n_x, n_h, n_y):
random.seed(2)
W1 = random.randn(n_h, n_x) * 0.5 # dot(W1,X.T)
W2 = random.randn(n_y, n_h) * 0.5 # dot(W2,Z1)
b1 = zeros((n_h, 1)) # Column vector
b2 = zeros((n_y, 1)) # Column vector
parameters = {'W1':W1,
'b1':b1,
'W2':W2,
'b2':b2}
return parameters
#------------------------------------------------------------
# Forward propagattion
# X:row->sample;
# Z2:col->sample
def forward_propagate(X, parameters):
W1 = parameters['W1']
b1 = parameters['b1']
W2 = parameters['W2']
b2 = parameters['b2']
Z1 = dot(W1, X.T) + b1 # X:row-->sample; Z1:col-->sample
# A1 = 1/(1+exp(-Z1))
# A1 = (1-exp(-Z1))/(1+exp(-Z1))
# A1 = array([[x if x > 0 else 0 for x in l] for l in Z1])
A1 = Z1
A1[A1< 0]=0
Z2 = dot(W2, A1) + b2 # Z2:col-->sample
# A2 = 1/(1+exp(-Z2)) # A:col-->sample
A2 = Z2 # Linear output
cache = {'Z1':Z1,
'A1':A1,
'Z2':Z2,
'A2':A2}
return Z2, cache
#------------------------------------------------------------
# Calculate the cost
# A2,Y: col->sample
def calculate_cost(A2, Y, parameters):
err = [x1-x2 for x1,x2 in zip(A2.T, Y.T)]
cost = [dot(e,e) for e in err]
return mean(cost)
#------------------------------------------------------------
# Backward propagattion
def backward_propagate(parameters, cache, X, Y):
m = X.shape[0] # Number of the samples
W1 = parameters['W1']
W2 = parameters['W2']
A1 = cache['A1']
A2 = cache['A2']
dZ2 = (A2 - Y) #* (A2 * (1-A2))
dW2 = dot(dZ2, A1.T) / m
db2 = sum(dZ2, axis=1, keepdims=True) / m
# dZ1 = dot(W2.T, dZ2) * (A1 * (1-A1))
# dZ1 = dot(W2.T, dZ2) * (1-A1**2)
# A11 = array([[1 if x >= 0 else 0 for x in l] for l in A1])
A11 = A1
A11[A11>0] = 1
A11[A11<0] = 0
dZ1 = dot(W2.T, dZ2) * A11
dW1 = dot(dZ1, X) / m
db1 = sum(dZ1, axis=1, keepdims=True) / m
grads = {'dW1':dW1,
'db1':db1,
'dW2':dW2,
'db2':db2}
return grads
#------------------------------------------------------------
# Update the parameters
def update_parameters(parameters, grads, learning_rate):
W1 = parameters['W1']
b1 = parameters['b1']
W2 = parameters['W2']
b2 = parameters['b2']
dW1 = grads['dW1']
db1 = grads['db1']
dW2 = grads['dW2']
db2 = grads['db2']
W1 = W1 - learning_rate * dW1
W2 = W2 - learning_rate * dW2
b1 = b1 - learning_rate * db1
b2 = b2 - learning_rate * db2
parameters = {'W1':W1,
'b1':b1,
'W2':W2,
'b2':b2}
return parameters
#------------------------------------------------------------
# Define the training
DISP_STEP = 1000
#------------------------------------------------------------
pltgif = PlotGIF()
#------------------------------------------------------------
def train(X, Y, num_iterations, learning_rate, print_cost=False):
# random.seed(3)
n_x = 1
n_y = 1
n_h = 50
lr = learning_rate
parameters = initialize_parameters(n_x, n_h, n_y)
XX,YY = shuffledata(X, Y)
costdim = []
x = linspace(-0.5, 0.5, 250).reshape(-1,1)
for i in range(0, num_iterations):
A2, cache = forward_propagate(XX, parameters)
cost = calculate_cost(A2, YY, parameters)
grads = backward_propagate(parameters, cache, XX, YY)
parameters = update_parameters(parameters, grads, lr)
if print_cost and i % DISP_STEP == 0:
printf('Cost after iteration:%i: %f'%(i, cost))
costdim.append(cost)
plt.clf()
y,cache = forward_propagate(x, parameters)
plt.plot(x.reshape(-1,1), y.reshape(-1,1))
plt.xlabel("x")
plt.ylabel("f(x)")
plt.grid(True)
plt.tight_layout()
plt.draw()
plt.pause(.1)
pltgif.append(plt)
if cost < 0.0001:
break
XX,YY = shuffledata(X, Y)
return parameters, costdim
#------------------------------------------------------------
parameter,costdim = train(x_data, y_data, 100000, 0.5, True)
pltgif.save(r'd:\temp\1.gif')
#------------------------------------------------------------
plt.clf()
plt.plot(arange(len(costdim))*DISP_STEP, costdim)
plt.xlabel("Step(10)")
plt.ylabel("Cost")
plt.grid(True)
plt.tight_layout()
plt.show()
#------------------------------------------------------------
# END OF FILE : HW13BPRELU.PY
#============================================================
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
- 211
- 212
文章来源: zhuoqing.blog.csdn.net,作者:卓晴,版权归原作者所有,如需转载,请联系作者。
原文链接:zhuoqing.blog.csdn.net/article/details/109757052
- 点赞
- 收藏
- 关注作者
评论(0)