Char-CNN网络性能调测经验总结
代码来源
运行环境
NPU: 1 Ascend 910
GPU: 1 Nvidia V100 16GB
运行结果
NPU数据
训练过程:
Epoch: 10
Iter: 14200, Train Loss: 0.067, Train Acc: 97.27%, Val Loss: 0.27, Val Acc: 94.00%, Time: 0:00:08.664672
Iter: 14300, Train Loss: 0.044, Train Acc: 98.24%, Val Loss: 0.27, Val Acc: 93.94%, Time: 0:00:06.487752
Iter: 14400, Train Loss: 0.05, Train Acc: 98.05%, Val Loss: 0.25, Val Acc: 94.10%, Time: 0:00:06.521828
Iter: 14500, Train Loss: 0.043, Train Acc: 98.05%, Val Loss: 0.27, Val Acc: 94.19%, Time: 0:00:06.634408
Iter: 14600, Train Loss: 0.03, Train Acc: 98.24%, Val Loss: 0.26, Val Acc: 94.12%, Time: 0:00:06.569271
Iter: 14700, Train Loss: 0.065, Train Acc: 97.66%, Val Loss: 0.27, Val Acc: 93.90%, Time: 0:00:06.557436
Iter: 14800, Train Loss: 0.047, Train Acc: 98.63%, Val Loss: 0.25, Val Acc: 94.10%, Time: 0:00:06.429037
Iter: 14900, Train Loss: 0.057, Train Acc: 98.05%, Val Loss: 0.25, Val Acc: 94.04%, Time: 0:00:06.659380
Iter: 15000, Train Loss: 0.036, Train Acc: 98.63%, Val Loss: 0.26, Val Acc: 94.03%, Time: 0:00:06.545762
Iter: 15100, Train Loss: 0.045, Train Acc: 98.24%, Val Loss: 0.25, Val Acc: 93.98%, Time: 0:00:06.593108
Iter: 15200, Train Loss: 0.058, Train Acc: 97.27%, Val Loss: 0.27, Val Acc: 93.90%, Time: 0:00:06.456884
Iter: 15300, Train Loss: 0.034, Train Acc: 98.44%, Val Loss: 0.25, Val Acc: 94.21%, Time: 0:00:06.499991
Iter: 15400, Train Loss: 0.072, Train Acc: 97.27%, Val Loss: 0.26, Val Acc: 94.10%, Time: 0:00:06.639853
Iter: 15500, Train Loss: 0.046, Train Acc: 97.85%, Val Loss: 0.25, Val Acc: 94.13%, Time: 0:00:06.548678
Iter: 15600, Train Loss: 0.062, Train Acc: 98.05%, Val Loss: 0.25, Val Acc: 94.13%, Time: 0:00:06.573635
all epoch time: 1128.5640110969543
step time: 13.8937622020741
测试结果:
Loading test data...
Testing...
Test Loss: 0.17, Test Acc: 94.83%
Precision, Recall and F1-Score...
precision recall f1-score support
体育 0.99 0.99 0.99 2632
财经 0.90 0.89 0.89 742
房产 1.00 0.98 0.99 401
家居 0.94 0.94 0.94 652
教育 0.95 0.92 0.94 839
科技 0.94 0.95 0.95 3259
时尚 0.91 0.93 0.92 267
时政 0.90 0.93 0.92 1262
游戏 0.95 0.90 0.93 487
娱乐 0.96 0.97 0.97 1853
彩票 0.93 0.96 0.94 152
股票 0.95 0.96 0.96 3088
社会 0.90 0.88 0.89 1017
星座 0.95 0.83 0.89 72
accuracy 0.95 16723
macro avg 0.94 0.93 0.94 16723
weighted avg 0.95 0.95 0.95 16723
Confusion Matrix...
[[2599 1 0 0 1 4 2 6 0 8 4 1 6 0]
[ 0 657 0 3 0 5 0 9 0 3 0 62 3 0]
[ 0 1 393 6 0 0 0 1 0 0 0 0 0 0]
[ 1 4 0 612 2 10 8 4 0 2 0 6 3 0]
[ 2 1 0 2 773 12 1 16 0 6 0 2 23 1]
[ 6 5 0 9 6 3105 0 26 18 14 1 36 33 0]
[ 1 1 0 4 1 4 248 1 0 5 0 0 1 1]
[ 4 4 0 1 11 17 3 1177 0 4 1 26 14 0]
[ 1 0 0 0 0 42 0 1 440 1 0 2 0 0]
[ 7 4 0 1 3 13 6 6 2 1794 0 2 15 0]
[ 3 0 0 0 0 0 0 0 0 0 146 0 3 0]
[ 0 51 0 2 0 44 0 32 0 0 0 2957 2 0]
[ 5 2 0 7 12 37 0 25 1 22 5 3 897 1]
[ 0 0 0 1 1 2 4 1 1 2 0 0 0 60]]
Time usage: 0:00:19.445022
GPU数据
Epoch: 10
Iter: 14200, Train Loss: 0.04, Train Acc: 98.63%, Val Loss: 0.26, Val Acc: 94.39%, Time: 0:00:09.938262
Iter: 14300, Train Loss: 0.062, Train Acc: 98.24%, Val Loss: 0.26, Val Acc: 94.22%, Time: 0:00:08.936174
Iter: 14400, Train Loss: 0.045, Train Acc: 98.83%, Val Loss: 0.26, Val Acc: 94.29%, Time: 0:00:08.980429
Iter: 14500, Train Loss: 0.027, Train Acc: 99.22%, Val Loss: 0.26, Val Acc: 94.17%, Time: 0:00:08.994467
Iter: 14600, Train Loss: 0.044, Train Acc: 97.85%, Val Loss: 0.26, Val Acc: 94.12%, Time: 0:00:09.027310
Iter: 14700, Train Loss: 0.024, Train Acc: 99.22%, Val Loss: 0.26, Val Acc: 94.24%, Time: 0:00:08.996118
Iter: 14800, Train Loss: 0.05, Train Acc: 98.44%, Val Loss: 0.26, Val Acc: 94.02%, Time: 0:00:09.029359
Iter: 14900, Train Loss: 0.04, Train Acc: 98.44%, Val Loss: 0.25, Val Acc: 94.25%, Time: 0:00:09.029142
Iter: 15000, Train Loss: 0.048, Train Acc: 98.63%, Val Loss: 0.26, Val Acc: 94.07%, Time: 0:00:09.031927
Iter: 15100, Train Loss: 0.079, Train Acc: 96.88%, Val Loss: 0.26, Val Acc: 94.35%, Time: 0:00:09.032181
Iter: 15200, Train Loss: 0.035, Train Acc: 99.02%, Val Loss: 0.24, Val Acc: 94.06%, Time: 0:00:09.009405
Iter: 15300, Train Loss: 0.066, Train Acc: 97.07%, Val Loss: 0.24, Val Acc: 94.22%, Time: 0:00:08.980669
Iter: 15400, Train Loss: 0.057, Train Acc: 97.66%, Val Loss: 0.25, Val Acc: 94.22%, Time: 0:00:09.033201
Iter: 15500, Train Loss: 0.05, Train Acc: 98.24%, Val Loss: 0.25, Val Acc: 94.24%, Time: 0:00:08.947044
Iter: 15600, Train Loss: 0.068, Train Acc: 97.07%, Val Loss: 0.24, Val Acc: 94.39%, Time: 0:00:09.040133
all epoch time: 1428.1370315551758
step time: 10.979338574341986
精度数据
Configuring CNN model...
Loading test data...
Testing...
Test Loss: 0.17, Test Acc: 94.84%
Precision, Recall and F1-Score...
precision recall f1-score support
体育 0.99 0.99 0.99 2632
财经 0.93 0.86 0.90 742
房产 1.00 0.99 0.99 401
家居 0.95 0.95 0.95 652
教育 0.94 0.92 0.93 839
科技 0.95 0.95 0.95 3259
时尚 0.92 0.93 0.92 267
时政 0.92 0.91 0.92 1262
游戏 0.96 0.90 0.93 487
娱乐 0.96 0.97 0.96 1853
彩票 0.96 0.94 0.95 152
股票 0.95 0.97 0.96 3088
社会 0.87 0.91 0.89 1017
星座 0.94 0.88 0.91 72
accuracy 0.95 16723
macro avg 0.94 0.93 0.94 16723
weighted avg 0.95 0.95 0.95 16723
Confusion Matrix...
[[2596 0 0 0 4 4 2 3 1 13 2 1 6 0]
[ 0 640 0 3 1 6 1 7 0 1 0 75 8 0]
[ 0 0 396 2 0 1 0 0 0 1 0 1 0 0]
[ 2 2 0 620 3 12 4 1 0 3 0 5 0 0]
[ 2 2 0 1 775 9 3 11 0 4 0 3 28 1]
[ 6 3 0 11 8 3085 0 28 13 23 1 43 38 0]
[ 0 0 0 4 2 2 248 3 0 6 0 0 0 2]
[ 5 5 0 2 11 16 3 1154 1 2 1 34 28 0]
[ 2 0 0 0 2 36 1 0 438 2 0 3 3 0]
[ 6 1 0 2 3 12 5 6 2 1792 0 2 21 1]
[ 3 0 0 0 1 0 0 0 0 0 143 0 5 0]
[ 0 33 0 1 0 47 0 20 0 0 0 2982 5 0]
[ 3 2 0 5 12 21 0 22 1 20 2 1 928 0]
[ 0 0 0 2 1 1 3 1 0 0 0 0 1 63]]
Time usage: 0:00:11.074670
解析思路
最早做优化的时候没敢修改参数,本来想通过autotune试试看能否自动算子调优,但是好像出现了不支持动态shape,那就只能通过读取profiling文件了,引出profiling文件后大致看了下感觉整体的性能应该还是可以的。还有就是通过报告发现优化器Adam的消耗时间比较久,但是试过其他的优化器后发现也差不多好像,所以优化器最后还是使用了Adam,毕竟Adam也是很多模型推荐的优化器的,感觉没有优化的必要。
于是就把思路转换到了更换Loss函数、优化器、激活函数上了。主要是激活函数从sigmoid和relu还有softmax以及softpluls这四个上面进行擦拭,查看不同激活函数下的情况。最后发现sigmoid和relu的差别不大。可能sigmoid会快微妙一点,但是精度可能会有损失,然后又换回了relu。
在把激活函数换sigmoid的时候也试过把tf.nn.softmax_cross_entropy_with_logits换成tf.nn.sigmoid_cross_entropy_with_logits,但是同样效果不是很明显
经过思考后来尝试了一下修改卷积核和全连接层神经数量,经过不断尝试,发现好像1024和512的组合比较适合,再小虽然速度快了,但是精度会有损失,再大精度提升不大,但是速度会慢。于是就考虑了一下这个。
num_filters = 1024 # 卷积核数目
hidden_dim = 512 # 全连接层神经元
当然试过把dropout修改为npu的,但是发现那样反正运行速度慢了,后来提了一个issue问了一下tf中两个dropout的区别,大致理解了。
突然看到cross_entropy = tf.nn.softmax_cross_entropy_with_logits有v2版本,于是就尝试着把原来的cross_entropy = tf.nn.softmax_cross_entropy_with_logits更换为了cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2试试看,用原先的参数测试下来感觉好像又有一点提升,于是我试着再次改了一下参数
num_filters = 512 # 卷积核数目
hidden_dim = 256 # 全连接层神经元
运行下来总体比上一次又有一点提升了。
整个优化过程目前因为知识有限每一没敢尝试的就是更换算子。后期有时间知识丰富了准备再试试看替换算子。
- 点赞
- 收藏
- 关注作者
评论(0)