建议使用以下浏览器,以获得最佳体验。 IE 9.0+以上版本 Chrome 31+ 谷歌浏览器 Firefox 30+ 火狐浏览器
请选择 进入手机版 | 继续访问电脑版
设置昵称

在此一键设置昵称,即可参与社区互动!

确定
我再想想
选择版块
直达楼层
标签
您还可以添加5个标签
  • 没有搜索到和“关键字”相关的标签
  • 云产品
  • 解决方案
  • 技术领域
  • 通用技术
  • 平台功能
取消

采纳成功

您已采纳当前回复为最佳回复

张辉

发帖: 137粉丝: 92

发消息 + 关注

更新于2021年04月08日 12:54:22 304 8
直达本楼层的链接
楼主
显示全部楼层
[技术干货] 【昇腾CANN训练营第一期模型营】-0407作业记录

钟老师在这一课中讲解了如何使用PyCharm在ModelArts中跑通 tensorflow的LENET模型。

我们就来试一下吧。

首先,你需要一个PyCharm。

张小白原来有个pycharm,但是过期了,所以还是去 https://www.jetbrains.com/pycharm/download/#section=s 下载社区版:

image.png

耐心等待下载完毕。

点击exe文件安装:

image.png

一路next安装完毕。。

image.png

完成后点击桌面的PyCharm图标启动:

image.png

选择 open,导入上节课的 工程:

image.png

参考上一次作业(https://bbs.huaweicloud.com/forum/thread-117665-1-1.html  )原来的PyCharm的路径,选择python运行环境:

image.png

确认是tensorflow 1.15的gpu版本。

试验一下本地LENET的运行:

image.png

好像没啥问题:

image.png

跑完之后精度很高了:

image.png

下面我们要做的事情就是:

(1)先把数据集文件传到服务器上

(2)安装配置PyCharm的ModelArts插件

(3)使用PyCharm创建训练作业,在PyCharm跑LENET训练。



举报
分享

分享文章到朋友圈

分享文章到微博

采纳成功

您已采纳当前回复为最佳回复

张辉

发帖: 137粉丝: 92

发消息 + 关注

更新于2021年04月08日 13:05:19
直达本楼层的链接
沙发
显示全部楼层

使用obs_browser_plus

image.png

输入账号名,AK,SK,点击登录:

image.png

进入之后,点击创建桶

image.png

这里可以看自己以前购买的OBS套餐的类型,可以用多AZ,也可以用单AZ。然后输入桶的名称:lenet,点击确定:

image.png

可以看到该桶已创建完毕。

然后将本地下载好的E:\CANN\lenet目录下的MNIST目录传到cann-lenet桶中。

点击桶名称:

image.png

点击上传:

image.png

选择添加文件夹:

image.png

系统会提示:

image.png

点击是,确认。

可以点击 任务管理,查看文件传输的状态。

image.png



点赞 评论 引用 举报

采纳成功

您已采纳当前回复为最佳回复

张辉

发帖: 137粉丝: 92

发消息 + 关注

更新于2021年04月08日 13:44:42
直达本楼层的链接
板凳
显示全部楼层

https://console.huaweicloud.com/modelarts/?region=cn-north-4#/dashboard 下载PyCharm Toolkit:

image.png

此时可以按照 MindSpore21天实战营(5)使用PyCharm Kit进行基于Wide&Deep实现CTR预估实战 https://bbs.huaweicloud.com/blogs/207322 三、PyCharm Kit的登场 章节的内容安装插件。此处不再赘述,仅以截图展示:

 image.png

image.png

image.png

image.png

下面开始创建训练作业:

image.png

我们回头看看训练脚本,原来是直接读本地的数据集文件。

现在数据集在OBS里面,所以需要做的是:将数据集从OBS拷贝到ModelArts环境里面:

image.png

耐心等待运行结束。

image.png

至此,在ModelArts里面的运行也结束了。


点赞 评论 引用 举报

采纳成功

您已采纳当前回复为最佳回复

张辉

发帖: 137粉丝: 92

发消息 + 关注

发表于2021年04月08日 16:04:01
直达本楼层的链接
地板
显示全部楼层

不过,上面好像环境选错了。选的是GPU环境。跟CANN没有半毛钱关系。

让我们改下:

image.png

另外,作业还让我们把训练的结果拷贝回 OBS

我们知道,训练的结果放在了 ModelArts环境的 checkpoint目录下(config.py)

PARAMETER_FILE = "checkpoint/variable.ckpt"

我们在OBS上新建一个ckpt的文件夹(特意用不同的目录名以示区分)

然后增加一行拷贝语句将checkpoint目录全部拷贝过来即可

mox.file.copy_parallel(src_url="checkpoint/", dst_url="obs://cann-lenet/ckpt")

image.png

而日志如下:

image.png

这里跑了5000,精度在0.9左右。

回到OBS可以看到ckpt目录下已经有拷贝回来的模型文件:

image.png


点赞 评论 引用 举报

采纳成功

您已采纳当前回复为最佳回复

张辉

发帖: 137粉丝: 92

发消息 + 关注

发表于2021年04月08日 16:07:04
直达本楼层的链接
5#
显示全部楼层

而job运行的日志自动传到了 "训练作业名"/log目录下,这里是:

obs://cann-lenet/MA-LeNet-04-08-13-28/log/

image.png

点赞 评论 引用 举报

采纳成功

您已采纳当前回复为最佳回复

张辉

发帖: 137粉丝: 92

发消息 + 关注

发表于2021年04月08日 17:01:05
直达本楼层的链接
6#
显示全部楼层

个人建议:先在本地测通过之后,再去ModelArts测,因为训练作业很花钱的。。

ModelArts和本地的区别也就是多个几个来回的拷贝而已。。


点赞 评论 引用 举报

采纳成功

您已采纳当前回复为最佳回复

张辉

发帖: 137粉丝: 92

发消息 + 关注

发表于2021年04月09日 21:04:27
直达本楼层的链接
7#
显示全部楼层

额,钟老师明确指出:上面的步骤做错了。因为做第二天的时候,张小白是不是把第一天的课程内容全忘了?


是的。


LENET虽然小,但是它也是用到了sess.run.

而sess.run是需要迁移的。。。

so。。。

# sess = tf.Session()
# 修改为CANN sess.run迁移 参考https://support.huaweicloud.com/tensorflowdevg-cann330alphaXtraining/atlasmprtg_13_0009.html
config = tf.ConfigProto()
custom_op = config.graph_options.rewrite_options.custom_optimizers.add()
custom_op.name = "NpuOptimizer"
config.graph_options.rewrite_options.remapping = RewriterConfig.OFF  # 必须显式关闭remap
sess = tf.Session(config=config)

这段代码必须补上的啊。。。。


张小白错了。。

那就补上重新run吧

image.png

modelarts的event log:

2021/04/09 20:54:58  Begin to check training configuration.
2021/04/09 20:55:03  Begin to upload training code.
2021/04/09 20:55:03  MA-LeNet-04-08-13-28/code/Train.py: 100%
2021/04/09 20:55:03  MA-LeNet-04-08-13-28/code/.git/objects/1a/5ed55f1747f270f97cfa708bac8d10771698eb: 100%
2021/04/09 20:55:03  MA-LeNet-04-08-13-28/code/.git/objects/84/5bdbb460b900ee67fabb5a616a3eb17b403b64: 100%
2021/04/09 20:55:03  MA-LeNet-04-08-13-28/code/.git/objects/4a/64d5b4fac48b8847eb08ad22c868499199e598: 100%
2021/04/09 20:55:04  MA-LeNet-04-08-13-28/code/.git/objects/ec/f56fc488be609e1bbb6ac0abe83ea6fd15da81: 100%
2021/04/09 20:55:04  MA-LeNet-04-08-13-28/code/.git/objects/bc/a582d83e460d262193b91a8d4eba481ce2d2f1: 100%
2021/04/09 20:55:04  MA-LeNet-04-08-13-28/code/checkpoint/variable.ckpt.data-00000-of-00001: 100%
2021/04/09 20:55:04  MA-LeNet-04-08-13-28/code/.git/objects/f1/3a6576563cb7a6fd380457ff2037e13f81e915: 100%
2021/04/09 20:55:04  MA-LeNet-04-08-13-28/code/.git/objects/24/520b933c516d580caa53ee8a31f6adba027779: 100%
2021/04/09 20:55:04  MA-LeNet-04-08-13-28/code/.git/hooks/prepare-commit-msg.sample: 100%
2021/04/09 20:55:04  MA-LeNet-04-08-13-28/code/.git/objects/5a/ce8ea93f8d2a3741f4d267954e2ad37e1b3a39: 63%
2021/04/09 20:55:04  MA-LeNet-04-08-13-28/code/.git/objects/5a/ce8ea93f8d2a3741f4d267954e2ad37e1b3a39: 100%
2021/04/09 20:55:04  MA-LeNet-04-08-13-28/code/MNIST_data/train-images-idx3-ubyte.gz: 10%
2021/04/09 20:55:04  MA-LeNet-04-08-13-28/code/MNIST_data/train-images-idx3-ubyte.gz: 21%
2021/04/09 20:55:04  MA-LeNet-04-08-13-28/code/MNIST_data/train-images-idx3-ubyte.gz: 31%
2021/04/09 20:55:04  MA-LeNet-04-08-13-28/code/MNIST_data/train-images-idx3-ubyte.gz: 42%
2021/04/09 20:55:04  MA-LeNet-04-08-13-28/code/MNIST_data/train-images-idx3-ubyte.gz: 52%
2021/04/09 20:55:04  MA-LeNet-04-08-13-28/code/MNIST_data/train-images-idx3-ubyte.gz: 63%
2021/04/09 20:55:05  MA-LeNet-04-08-13-28/code/MNIST_data/train-images-idx3-ubyte.gz: 74%
2021/04/09 20:55:05  MA-LeNet-04-08-13-28/code/MNIST_data/train-images-idx3-ubyte.gz: 84%
2021/04/09 20:55:05  MA-LeNet-04-08-13-28/code/MNIST_data/train-images-idx3-ubyte.gz: 95%
2021/04/09 20:55:05  MA-LeNet-04-08-13-28/code/MNIST_data/train-images-idx3-ubyte.gz: 100%
2021/04/09 20:55:05  MA-LeNet-04-08-13-28/code/.git/logs/refs/remotes/origin/HEAD: 100%
2021/04/09 20:55:05  MA-LeNet-04-08-13-28/code/.git/objects/90/574a59a1250423e954b80ee090626484e8fd3a: 100%
2021/04/09 20:55:05  MA-LeNet-04-08-13-28/code/.git/objects/15/48ce5122be8c1394894c33b161891303c7692c: 100%
2021/04/09 20:55:05  MA-LeNet-04-08-13-28/code/.git/objects/6f/63a63ccb633131fa1e523f4743d2a8b5da7155: 100%
2021/04/09 20:55:05  MA-LeNet-04-08-13-28/code/.git/objects/b5/0e4b6bccdebde3d57f575c7fbeb24bec277f10: 10%
2021/04/09 20:55:05  MA-LeNet-04-08-13-28/code/.git/objects/b5/0e4b6bccdebde3d57f575c7fbeb24bec277f10: 21%
2021/04/09 20:55:05  MA-LeNet-04-08-13-28/code/.git/objects/b5/0e4b6bccdebde3d57f575c7fbeb24bec277f10: 31%
2021/04/09 20:55:06  MA-LeNet-04-08-13-28/code/.git/objects/b5/0e4b6bccdebde3d57f575c7fbeb24bec277f10: 42%
2021/04/09 20:55:06  MA-LeNet-04-08-13-28/code/.git/objects/b5/0e4b6bccdebde3d57f575c7fbeb24bec277f10: 52%
2021/04/09 20:55:06  MA-LeNet-04-08-13-28/code/.git/objects/b5/0e4b6bccdebde3d57f575c7fbeb24bec277f10: 63%
2021/04/09 20:55:06  MA-LeNet-04-08-13-28/code/.git/objects/b5/0e4b6bccdebde3d57f575c7fbeb24bec277f10: 74%
2021/04/09 20:55:06  MA-LeNet-04-08-13-28/code/.git/objects/b5/0e4b6bccdebde3d57f575c7fbeb24bec277f10: 84%
2021/04/09 20:55:06  MA-LeNet-04-08-13-28/code/.git/objects/b5/0e4b6bccdebde3d57f575c7fbeb24bec277f10: 95%
2021/04/09 20:55:07  MA-LeNet-04-08-13-28/code/.git/objects/b5/0e4b6bccdebde3d57f575c7fbeb24bec277f10: 100%
2021/04/09 20:55:07  MA-LeNet-04-08-13-28/code/.git/objects/03/d9549ea8e4ada36fb3ecbc30fef08175b7d728: 100%
2021/04/09 20:55:07  MA-LeNet-04-08-13-28/code/.git/objects/54/cb69c5171cbdde593f0faf218188356fb72fd4: 100%
2021/04/09 20:55:07  MA-LeNet-04-08-13-28/code/.git/objects/7d/e0b44635aeb4bf0f2fff3985e713cd988a3adb: 100%
2021/04/09 20:55:07  MA-LeNet-04-08-13-28/code/.git/objects/ea/d8d1f073d82256ab636c4819cb1a0f5b1cf309: 100%
2021/04/09 20:55:07  MA-LeNet-04-08-13-28/code/checkpoint/variable.ckpt.meta: 100%
2021/04/09 20:55:07  MA-LeNet-04-08-13-28/code/.git/config: 100%
2021/04/09 20:55:07  MA-LeNet-04-08-13-28/code/.git/index: 100%
2021/04/09 20:55:07  MA-LeNet-04-08-13-28/code/.git/logs/HEAD: 100%
2021/04/09 20:55:07  MA-LeNet-04-08-13-28/code/MNIST_data/train-labels-idx1-ubyte.gz: 100%
2021/04/09 20:55:07  MA-LeNet-04-08-13-28/code/.git/objects/5c/278eab9fa125d059f0321fb41a9ca6cdb4d0e2: 100%
2021/04/09 20:55:07  MA-LeNet-04-08-13-28/code/checkpoint/variable.ckpt.index: 100%
2021/04/09 20:55:07  MA-LeNet-04-08-13-28/code/.git/objects/6f/02b0a737c3aa882f1a667c70642fe43ddcd8bf: 100%
2021/04/09 20:55:07  MA-LeNet-04-08-13-28/code/.git/objects/c1/bbd4789f0853136fbcdd84ab2585282502412d: 100%
2021/04/09 20:55:07  MA-LeNet-04-08-13-28/code/.git/objects/16/d7a5202c5a5f47b4f74e5bab131c75db0e2c73: 100%
2021/04/09 20:55:07  MA-LeNet-04-08-13-28/code/.git/objects/db/8d7714623cc14487bde8778d1a5302bfe2305b: 100%
2021/04/09 20:55:08  MA-LeNet-04-08-13-28/code/.git/objects/14/3886d78c45714f8bc15e09af3b2ca0c4084674: 100%
2021/04/09 20:55:08  MA-LeNet-04-08-13-28/code/.git/logs/refs/heads/master: 100%
2021/04/09 20:55:08  MA-LeNet-04-08-13-28/code/.git/objects/1b/e11a16ffdb6a1c77fa90287b29c0182f30221d: 100%
2021/04/09 20:55:08  MA-LeNet-04-08-13-28/code/.git/objects/32/6b2f2913bdcaf4ed74d747f1235e55d38baba0: 100%
2021/04/09 20:55:08  MA-LeNet-04-08-13-28/code/.git/objects/6a/0cca30641a8f4da6e21591e5e96f0bf42a39c9: 100%
2021/04/09 20:55:08  MA-LeNet-04-08-13-28/code/lenet.py: 100%
2021/04/09 20:55:08  MA-LeNet-04-08-13-28/code/.git/objects/9b/1c726e77a3a3bcaa169714968328cae0920a58: 100%
2021/04/09 20:55:08  MA-LeNet-04-08-13-28/code/.git/objects/70/7a576bb523304d5b674de436c0779d77b7d480: 100%
2021/04/09 20:55:08  MA-LeNet-04-08-13-28/code/.git/packed-refs: 100%
2021/04/09 20:55:08  MA-LeNet-04-08-13-28/code/.git/objects/68/52a520f28ff85a3b752e6b0d1e41a4155d54e1: 100%
2021/04/09 20:55:08  MA-LeNet-04-08-13-28/code/.git/objects/7a/4175fbfff031a28bbf8159030ac2f0b09d5610: 100%
2021/04/09 20:55:08  MA-LeNet-04-08-13-28/code/.git/objects/5c/ce228d8a438649be2632518d48fa0b3880a048: 100%
2021/04/09 20:55:08  MA-LeNet-04-08-13-28/code/.git/objects/9d/592d9b0c7be91abc7bdaa7e1dd24cfe697d7b5: 100%
2021/04/09 20:55:08  MA-LeNet-04-08-13-28/code/checkpoint/checkpoint: 100%
2021/04/09 20:55:08  MA-LeNet-04-08-13-28/code/.git/objects/86/8fb097f25af17e29ba4dd3b9a10c9f346a24e2: 100%
2021/04/09 20:55:08  MA-LeNet-04-08-13-28/code/.git/refs/heads/master: 100%
2021/04/09 20:55:08  MA-LeNet-04-08-13-28/code/.git/hooks/pre-receive.sample: 100%
2021/04/09 20:55:09  MA-LeNet-04-08-13-28/code/.git/objects/6d/cf4db1212c0cff5af1b932c886b53b7b49609f: 100%
2021/04/09 20:55:09  MA-LeNet-04-08-13-28/code/.git/hooks/pre-push.sample: 100%
2021/04/09 20:55:09  MA-LeNet-04-08-13-28/code/.git/hooks/pre-rebase.sample: 100%
2021/04/09 20:55:09  MA-LeNet-04-08-13-28/code/.git/objects/94/a25f7f4cb416c083d265558da75d457237d671: 100%
2021/04/09 20:55:09  MA-LeNet-04-08-13-28/code/README.md: 100%
2021/04/09 20:55:09  MA-LeNet-04-08-13-28/code/.git/hooks/post-update.sample: 100%
2021/04/09 20:55:09  MA-LeNet-04-08-13-28/code/.git/objects/aa/4e40b85f18ad61a7fc994edcbfa7cdd12937c9: 100%
2021/04/09 20:55:09  MA-LeNet-04-08-13-28/code/.git/info/exclude: 100%
2021/04/09 20:55:09  MA-LeNet-04-08-13-28/code/.gitignore: 100%
2021/04/09 20:55:09  MA-LeNet-04-08-13-28/code/.git/hooks/pre-applypatch.sample: 100%
2021/04/09 20:55:09  MA-LeNet-04-08-13-28/code/.git/objects/30/c742279d46b14cea35b777ae70afb3c52ce2e0: 100%
2021/04/09 20:55:09  MA-LeNet-04-08-13-28/code/.git/hooks/applypatch-msg.sample: 100%
2021/04/09 20:55:09  MA-LeNet-04-08-13-28/code/.git/description: 100%
2021/04/09 20:55:09  MA-LeNet-04-08-13-28/code/Inference.py: 100%
2021/04/09 20:55:09  MA-LeNet-04-08-13-28/code/.git/objects/ef/86f5aaff23fba341b8bc8adb23b12479158e2f: 100%
2021/04/09 20:55:09  MA-LeNet-04-08-13-28/code/MNIST_data/t10k-labels-idx1-ubyte.gz: 100%
2021/04/09 20:55:09  MA-LeNet-04-08-13-28/code/.git/objects/ea/7407ceab7175f23065a5e89e78cf094b224e36: 100%
2021/04/09 20:55:09  MA-LeNet-04-08-13-28/code/.git/objects/67/ee8bbe74652b07cf93b6b94c1fc21a047a4e2c: 100%
2021/04/09 20:55:09  MA-LeNet-04-08-13-28/code/MNIST_data/t10k-images-idx3-ubyte.gz: 63%
2021/04/09 20:55:10  MA-LeNet-04-08-13-28/code/MNIST_data/t10k-images-idx3-ubyte.gz: 100%
2021/04/09 20:55:10  MA-LeNet-04-08-13-28/code/.git/objects/ff/353a9b013498e79f1d8ebfbca22afa3e6f24b4: 100%
2021/04/09 20:55:10  MA-LeNet-04-08-13-28/code/.git/objects/e7/aba12eb5e76500ebe936dac3b6fc6ab08203d4: 100%
2021/04/09 20:55:10  MA-LeNet-04-08-13-28/code/.git/hooks/update.sample: 100%
2021/04/09 20:55:10  MA-LeNet-04-08-13-28/code/LICENSE: 100%
2021/04/09 20:55:10  MA-LeNet-04-08-13-28/code/config.py: 100%
2021/04/09 20:55:10  MA-LeNet-04-08-13-28/code/.git/objects/d6/444f5cc9d9a3aa5cf5f24069000a0ad00dee50: 100%
2021/04/09 20:55:10  MA-LeNet-04-08-13-28/code/.git/hooks/commit-msg.sample: 100%
2021/04/09 20:55:10  MA-LeNet-04-08-13-28/code/.git/refs/remotes/origin/HEAD: 100%
2021/04/09 20:55:10  MA-LeNet-04-08-13-28/code/__pycache__/config.cpython-37.pyc: 100%
2021/04/09 20:55:10  MA-LeNet-04-08-13-28/code/.git/objects/72/364f99fe4bf8d5262df3b19b33102aeaa791e5: 100%
2021/04/09 20:55:10  MA-LeNet-04-08-13-28/code/.git/hooks/pre-commit.sample: 100%
2021/04/09 20:55:10  MA-LeNet-04-08-13-28/code/.git/objects/8c/b3aaf60227c28f4387295663ea23dc6688eb89: 100%
2021/04/09 20:55:10  MA-LeNet-04-08-13-28/code/.git/objects/73/47ceddefc24bf608d038a757fdaf0b4e446a06: 100%
2021/04/09 20:55:10  MA-LeNet-04-08-13-28/code/.git/objects/59/ba7a2b8aeed62b8fddc5dd69c1f7640a66f3f4: 100%
2021/04/09 20:55:10  MA-LeNet-04-08-13-28/code/.git/objects/fd/e0b8261fbaa4a52494caf97f722ba04b65ffe4: 100%
2021/04/09 20:55:11  MA-LeNet-04-08-13-28/code/.git/hooks/pre-merge-commit.sample: 100%
2021/04/09 20:55:11  MA-LeNet-04-08-13-28/code/.git/objects/86/ac05bbfdb75422745055d18199515bf83f98c2: 100%
2021/04/09 20:55:11  MA-LeNet-04-08-13-28/code/.git/objects/72/3564a27d9a6ce7fbfb104d01773d6f621b7655: 100%
2021/04/09 20:55:11  MA-LeNet-04-08-13-28/code/.git/objects/a8/2fab49f0be65ddb1e1c7635ae2ef461d7196b8: 100%
2021/04/09 20:55:11  MA-LeNet-04-08-13-28/code/.git/objects/f9/27f79cae958f2cafd631778a10502614e68052: 100%
2021/04/09 20:55:11  MA-LeNet-04-08-13-28/code/.git/objects/a7/e141541c1d08d3f2ed01eae03e644f9e2fd0c5: 100%
2021/04/09 20:55:11  MA-LeNet-04-08-13-28/code/.git/objects/59/fa08c21d70aa1e6cc23e9f15a2068e3aad12fd: 100%
2021/04/09 20:55:11  MA-LeNet-04-08-13-28/code/.git/hooks/fsmonitor-watchman.sample: 100%
2021/04/09 20:55:11  MA-LeNet-04-08-13-28/code/.git/HEAD: 100%
2021/04/09 20:55:11  MA-LeNet-04-08-13-28/code/.git/objects/b0/e5a2b3d05ee5ea3e5599b3040f6e81fa54b97b: 100%
2021/04/09 20:55:11  MA-LeNet-04-08-13-28/code/__pycache__/lenet.cpython-37.pyc: 100%
2021/04/09 20:55:11  MA-LeNet-04-08-13-28/code/.git/objects/12/33b9aacd335b94a01e1ee3216f43d797d7d140: 100%
2021/04/09 20:55:11  MA-LeNet-04-08-13-28/code/.git/objects/7b/c30efb3c0056d92782dab8345474eac1ce9826: 100%
2021/04/09 20:55:11  MA-LeNet-04-08-13-28/code/.git/objects/90/9dd189e990bd84ca2e4f830f2d81dfd7428591: 100%
2021/04/09 20:55:11  MA-LeNet-04-08-13-28/code/UI.py: 100%
2021/04/09 20:55:11  Files are uploaded successfully.
2021/04/09 20:55:11  Begin to get training job pre version.
2021/04/09 20:55:12  Begin to create training job.
2021/04/09 20:55:13  Training job is created successfully.
2021/04/09 20:55:13  Job id: 767077, version id: 1494473
2021/04/09 20:55:13  Job name: MA-LeNet-04-08-13-28
2021/04/09 20:55:13  Version name: V0015
2021/04/09 20:55:13  Begin to get training job log.
-------------------------------
2021/04/09 20:57:16  Current training job status: Successful
2021/04/09 20:57:16  Training Duration: 00:01:16, Training Output Path: /cann-lenet/MA-LeNet-04-08-13-28/output/V0015/
2021/04/09 20:57:16  ModelArts Training is Finished.

job的日志:

2021/04/09 20:55:14  Current training job status: Initializing
2021/04/09 20:55:31  Current training job status: Running
do nothing
[Modelarts Service Log]user: uid=1101(work) gid=1101(work) groups=1101(work),1000(HwHiAiUser)
[Modelarts Service Log]pwd: /home/work
[Modelarts Service Log]app_url: s3://cann-lenet/MA-LeNet-04-08-13-28/code/
[Modelarts Service Log]boot_file: code/Train.py
[Modelarts Service Log]log_url: /tmp/log/MA-LeNet-04-08-13-28.log
[Modelarts Service Log]command: code/Train.py --data_url=s3://cann-lenet/MNIST_data/ --train_url=s3://cann-lenet/MA-LeNet-04-08-13-28/output/V0015/
[Modelarts Service Log]local_code_dir: 
[Modelarts Service Log]Training start at 2021-04-09-20:57:06
[Modelarts Service Log][modelarts_create_log] modelarts-pipe found
[Modelarts Service Log]handle inputs of training job
INFO:root:Using MoXing-v1.17.3-8aa951bc
INFO:root:Using OBS-Python-SDK-3.20.7
[ModelArts Service Log][INFO][2021/04/09 20:57:06]: env MA_INPUTS is not found, skip the inputs handler
INFO:root:Using MoXing-v1.17.3-8aa951bc
INFO:root:Using OBS-Python-SDK-3.20.7
[ModelArts Service Log]2021-04-09 20:57:07,689 - modelarts-downloader.py[line:620] - INFO: Main: modelarts-downloader starting with Namespace(dst='./', recursive=True, skip_creating_dir=False, src='s3://cann-lenet/MA-LeNet-04-08-13-28/code/', trace=False, type='common', verbose=False)
[Modelarts Service Log][modelarts_logger] modelarts-pipe found
/home/work/user-job-dir
[Modelarts Service Log][modelarts_logger] modelarts-pipe found
INFO:root:Using MoXing-v1.17.3-8aa951bc
INFO:root:Using OBS-Python-SDK-3.20.7
[Modelarts Service Log]2021-04-09 20:57:09,945 - INFO - background upload stdout log to s3://cann-lenet/MA-LeNet-04-08-13-28/log/jobc08f1491-job-ma-lenet-04-08-13-28-0.log
[Modelarts Service Log]2021-04-09 20:57:09,953 - INFO - Ascend Driver: Version=20.2.0
[Modelarts Service Log]2021-04-09 20:57:09,954 - INFO - you are advised to use ASCEND_DEVICE_ID env instead of DEVICE_ID, as the DEVICE_ID env will be discarded in later versions
[Modelarts Service Log]2021-04-09 20:57:09,954 - INFO - particularly, ${ASCEND_DEVICE_ID} == ${DEVICE_ID}, it's the logical device id
[Modelarts Service Log]2021-04-09 20:57:09,954 - INFO - Davinci training command
[Modelarts Service Log]2021-04-09 20:57:09,954 - INFO - ['/usr/bin/python', '/home/work/user-job-dir/code/Train.py', '--data_url=s3://cann-lenet/MNIST_data/', '--train_url=s3://cann-lenet/MA-LeNet-04-08-13-28/output/V0015/']
[Modelarts Service Log]2021-04-09 20:57:09,954 - INFO - Wait for Rank table file ready
[Modelarts Service Log]2021-04-09 20:57:09,954 - INFO - Rank table file (K8S generated) is ready for read
[Modelarts Service Log]2021-04-09 20:57:09,955 - INFO - 
{
    "status": "completed",
    "group_count": "1",
    "group_list": [
        {
            "group_name": "job-ma-lenet-04-08-13-28",
            "device_count": "1",
            "instance_count": "1",
            "instance_list": [
                {
                    "pod_name": "jobc08f1491-job-ma-lenet-04-08-13-28-0",
                    "server_id": "192.168.0.136",
                    "devices": [
                        {
                            "device_id": "1",
                            "device_ip": "192.2.5.112"
                        }
                    ]
                }
            ]
        }
    ]
}
[Modelarts Service Log]2021-04-09 20:57:09,955 - INFO - Rank table file (C7x)
[Modelarts Service Log]2021-04-09 20:57:09,955 - INFO - 
{
    "status": "completed",
    "version": "1.0",
    "server_count": "1",
    "server_list": [
        {
            "server_id": "192.168.0.136",
            "device": [
                {
                    "device_id": "1",
                    "device_ip": "192.2.5.112",
                    "rank_id": "0"
                }
            ]
        }
    ]
}
[Modelarts Service Log]2021-04-09 20:57:09,956 - INFO - Rank table file (C7x) is generated
[Modelarts Service Log]2021-04-09 20:57:09,956 - INFO - Current server
[Modelarts Service Log]2021-04-09 20:57:09,956 - INFO - 
{
    "server_id": "192.168.0.136",
    "device": [
        {
            "device_id": "1",
            "device_ip": "192.2.5.112",
            "rank_id": "0"
        }
    ]
}
[Modelarts Service Log]2021-04-09 20:57:09,957 - INFO - bootstrap proc-rank-0-device-0
[Modelarts Service Log]2021-04-09 20:57:09,964 - INFO - proc-rank-0-device-0 (pid: 140)
WARNING:tensorflow:From /usr/local/Ascend/tfplugin/latest/tfplugin/python/site-packages/npu_bridge/estimator/npu/npu_optimizer.py:225: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
INFO:root:Using MoXing-v1.17.3-8aa951bc
INFO:root:Using OBS-Python-SDK-3.20.7
WARNING:tensorflow:From /home/work/user-job-dir/code/Train.py:16: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
WARNING:tensorflow:From /home/work/user-job-dir/code/Train.py:16: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/contrib/learn/python/learn/datasets/mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Please write your own downloading logic.
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/contrib/learn/python/learn/datasets/mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Please write your own downloading logic.
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/contrib/learn/python/learn/datasets/mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.data to implement this functionality.
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/contrib/learn/python/learn/datasets/mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-images-idx3-ubyte.gz
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/contrib/learn/python/learn/datasets/mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.data to implement this functionality.
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/contrib/learn/python/learn/datasets/mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/contrib/learn/python/learn/datasets/mnist.py:110: dense_to_one_hot (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.one_hot on tensors.
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/contrib/learn/python/learn/datasets/mnist.py:110: dense_to_one_hot (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.one_hot on tensors.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/contrib/learn/python/learn/datasets/mnist.py:290: DataSet.__init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/contrib/learn/python/learn/datasets/mnist.py:290: DataSet.__init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
WARNING:tensorflow:From /home/work/user-job-dir/code/Train.py:19: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
WARNING:tensorflow:From /home/work/user-job-dir/code/Train.py:19: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
WARNING:tensorflow:From /home/work/user-job-dir/code/Train.py:23: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
WARNING:tensorflow:From /home/work/user-job-dir/code/Train.py:23: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2021-04-09 20:57:16.395449: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2021-04-09 20:57:16.404146: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1dd75000 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-04-09 20:57:16.404194: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
WARNING:tensorflow:From /cache/user-job-dir/code/lenet.py:7: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
WARNING:tensorflow:From /cache/user-job-dir/code/lenet.py:7: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
WARNING:tensorflow:From /cache/user-job-dir/code/lenet.py:13: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
WARNING:tensorflow:From /cache/user-job-dir/code/lenet.py:13: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/contrib/layers/python/layers/layers.py:1057: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/contrib/layers/python/layers/layers.py:1057: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/contrib/layers/python/layers/layers.py:1634: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.flatten instead.
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/contrib/layers/python/layers/layers.py:1634: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.flatten instead.
WARNING:tensorflow:From /cache/user-job-dir/code/lenet.py:23: softmax_cross_entropy (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.softmax_cross_entropy instead. Note that the order of the logits and labels arguments has been changed.
WARNING:tensorflow:From /cache/user-job-dir/code/lenet.py:23: softmax_cross_entropy (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.softmax_cross_entropy instead. Note that the order of the logits and labels arguments has been changed.
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/contrib/losses/python/losses/loss_ops.py:373: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.
See `tf.nn.softmax_cross_entropy_with_logits_v2`.
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/contrib/losses/python/losses/loss_ops.py:373: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.
See `tf.nn.softmax_cross_entropy_with_logits_v2`.
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/contrib/losses/python/losses/loss_ops.py:374: compute_weighted_loss (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.compute_weighted_loss instead.
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/contrib/losses/python/losses/loss_ops.py:374: compute_weighted_loss (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.compute_weighted_loss instead.
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/contrib/losses/python/losses/loss_ops.py:152: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/contrib/losses/python/losses/loss_ops.py:152: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/contrib/losses/python/losses/loss_ops.py:154: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/contrib/losses/python/losses/loss_ops.py:154: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/contrib/losses/python/losses/loss_ops.py:121: add_loss (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.add_loss instead.
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/contrib/losses/python/losses/loss_ops.py:121: add_loss (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.add_loss instead.
WARNING:tensorflow:From /cache/user-job-dir/code/lenet.py:25: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.
WARNING:tensorflow:From /cache/user-job-dir/code/lenet.py:25: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.
WARNING:tensorflow:From /home/work/user-job-dir/code/Train.py:32: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.
WARNING:tensorflow:From /home/work/user-job-dir/code/Train.py:32: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/python/util/tf_should_use.py:198: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
WARNING:tensorflow:From /usr/local/ma/python3.7/lib/python3.7/site-packages/tensorflow_core/python/util/tf_should_use.py:198: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
2021-04-09 20:57:20.046359: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node _SOURCE is null.
2021-04-09 20:57:20.046426: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node _SINK is null.
2021-04-09 20:57:20.047140: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node init is null.
[ModelArts Service Log]modelarts-pipe: will create log file /tmp/log/MA-LeNet-04-08-13-28.log
[ModelArts Service Log]modelarts-pipe: will create log file /tmp/log/MA-LeNet-04-08-13-28.log
[ModelArts Service Log]modelarts-pipe: will write log file /tmp/log/MA-LeNet-04-08-13-28.log
[ModelArts Service Log]modelarts-pipe: param for max log length: 1073741824
[ModelArts Service Log]modelarts-pipe: param for whether exit on overflow: 0
[ModelArts Service Log]modelarts-pipe: total length: 24
[ModelArts Service Log]modelarts-pipe: will create log file /tmp/log/MA-LeNet-04-08-13-28.log
[ModelArts Service Log]modelarts-pipe: will write log file /tmp/log/MA-LeNet-04-08-13-28.log
[ModelArts Service Log]modelarts-pipe: param for max log length: 1073741824
[ModelArts Service Log]modelarts-pipe: param for whether exit on overflow: 0
2021-04-09 20:57:26.892243: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node _SOURCE is null.
2021-04-09 20:57:26.892324: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node _SINK is null.
2021-04-09 20:57:26.892629: W tf_adapter/util/infershape_util.cc:337] The shape of node Lenet/flat6_1/flatten/Reshape output 0 is [?,?], unknown shape.
2021-04-09 20:57:26.892660: W tf_adapter/util/infershape_util.cc:337] The shape of node Lenet/fc7_1/MatMul output 0 is [?,84], unknown shape.
2021-04-09 20:57:26.892683: W tf_adapter/util/infershape_util.cc:337] The shape of node Lenet/fc7_1/BiasAdd output 0 is [?,84], unknown shape.
2021-04-09 20:57:26.892703: W tf_adapter/util/infershape_util.cc:337] The shape of node Lenet/fc7_1/Relu output 0 is [?,84], unknown shape.
2021-04-09 20:57:26.892722: W tf_adapter/util/infershape_util.cc:337] The shape of node Lenet/fc9_1/MatMul output 0 is [?,10], unknown shape.
2021-04-09 20:57:26.892740: W tf_adapter/util/infershape_util.cc:337] The shape of node Lenet/fc9_1/BiasAdd output 0 is [?,10], unknown shape.
2021-04-09 20:57:26.892758: W tf_adapter/util/infershape_util.cc:337] The shape of node ArgMax_1 output 0 is [?], unknown shape.
step 0, training accuracy 0.06
2021-04-09 20:57:32.432716: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node _SOURCE is null.
2021-04-09 20:57:32.432793: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node _SINK is null.
2021-04-09 20:57:32.433310: W tf_adapter/util/infershape_util.cc:337] The shape of node softmax_cross_entropy_loss/xentropy/Reshape_1 output 0 is [?,?], unknown shape.
2021-04-09 20:57:32.433437: W tf_adapter/util/infershape_util.cc:337] The shape of node Lenet/flat6/flatten/Reshape output 0 is [?,?], unknown shape.
2021-04-09 20:57:32.433462: W tf_adapter/util/infershape_util.cc:337] The shape of node Lenet/fc7/MatMul output 0 is [?,84], unknown shape.
2021-04-09 20:57:32.433484: W tf_adapter/util/infershape_util.cc:337] The shape of node Lenet/fc7/BiasAdd output 0 is [?,84], unknown shape.
2021-04-09 20:57:32.433504: W tf_adapter/util/infershape_util.cc:337] The shape of node Lenet/fc7/Relu output 0 is [?,84], unknown shape.
2021-04-09 20:57:32.433534: W tf_adapter/util/infershape_util.cc:337] The shape of node Lenet/dropout8/dropout/random_uniform/RandomUniform output 0 is [?,84], unknown shape.
2021-04-09 20:57:32.433554: W tf_adapter/util/infershape_util.cc:337] The shape of node gradients/Lenet/dropout8/dropout/mul_grad/BroadcastGradientArgs output 0 is [?], unknown shape.
2021-04-09 20:57:32.433569: W tf_adapter/util/infershape_util.cc:337] The shape of node gradients/Lenet/dropout8/dropout/mul_grad/BroadcastGradientArgs output 1 is [?], unknown shape.
2021-04-09 20:57:32.433587: W tf_adapter/util/infershape_util.cc:337] The shape of node Lenet/dropout8/dropout/random_uniform/mul output 0 is [?,84], unknown shape.
2021-04-09 20:57:32.433604: W tf_adapter/util/infershape_util.cc:337] The shape of node Lenet/dropout8/dropout/GreaterEqual output 0 is [?,84], unknown shape.
2021-04-09 20:57:32.433621: W tf_adapter/util/infershape_util.cc:337] The shape of node Lenet/dropout8/dropout/Cast output 0 is [?,84], unknown shape.
2021-04-09 20:57:32.433640: W tf_adapter/util/infershape_util.cc:337] The shape of node Lenet/dropout8/dropout/mul output 0 is [?,84], unknown shape.
2021-04-09 20:57:32.433663: W tf_adapter/util/infershape_util.cc:337] The shape of node Lenet/dropout8/dropout/mul_1 output 0 is [?,84], unknown shape.
2021-04-09 20:57:32.433681: W tf_adapter/util/infershape_util.cc:337] The shape of node Lenet/fc9/MatMul output 0 is [?,10], unknown shape.
2021-04-09 20:57:32.433700: W tf_adapter/util/infershape_util.cc:337] The shape of node Lenet/fc9/BiasAdd output 0 is [?,10], unknown shape.
2021-04-09 20:57:32.433717: W tf_adapter/util/infershape_util.cc:337] The shape of node Lenet/fc9/Relu output 0 is [?,10], unknown shape.
2021-04-09 20:57:32.433734: W tf_adapter/util/infershape_util.cc:337] The shape of node softmax_cross_entropy_loss/xentropy output 0 is [?], unknown shape.
2021-04-09 20:57:32.433748: W tf_adapter/util/infershape_util.cc:337] The shape of node softmax_cross_entropy_loss/xentropy output 1 is [?,10], unknown shape.
2021-04-09 20:57:32.433788: W tf_adapter/util/infershape_util.cc:337] The shape of node softmax_cross_entropy_loss/ones output 0 is [?], unknown shape.
2021-04-09 20:57:32.433822: W tf_adapter/util/infershape_util.cc:337] The shape of node softmax_cross_entropy_loss/Mul_1 output 0 is [?], unknown shape.
2021-04-09 20:57:32.433858: W tf_adapter/util/infershape_util.cc:337] The shape of node gradients/softmax_cross_entropy_loss/Sum_grad/Tile output 0 is [?], unknown shape.
2021-04-09 20:57:32.433878: W tf_adapter/util/infershape_util.cc:337] The shape of node gradients/softmax_cross_entropy_loss/xentropy_grad/ExpandDims output 0 is [?,1], unknown shape.
2021-04-09 20:57:32.433898: W tf_adapter/util/infershape_util.cc:337] The shape of node gradients/softmax_cross_entropy_loss/xentropy_grad/mul output 0 is [?,10], unknown shape.
2021-04-09 20:57:32.433915: W tf_adapter/util/infershape_util.cc:337] The shape of node gradients/Lenet/fc9/Relu_grad/ReluGrad output 0 is [?,10], unknown shape.
2021-04-09 20:57:32.433950: W tf_adapter/util/infershape_util.cc:337] The shape of node gradients/Lenet/fc9/MatMul_grad/MatMul output 0 is [?,84], unknown shape.
2021-04-09 20:57:32.433982: W tf_adapter/util/infershape_util.cc:337] The shape of node gradients/Lenet/dropout8/dropout/mul_1_grad/Mul output 0 is [?,84], unknown shape.
2021-04-09 20:57:32.434000: W tf_adapter/util/infershape_util.cc:337] The shape of node gradients/Lenet/dropout8/dropout/mul_grad/Mul output 0 is [?,84], unknown shape.
2021-04-09 20:57:32.434016: W tf_adapter/util/infershape_util.cc:337] The shape of node gradients/Lenet/dropout8/dropout/mul_grad/Sum output 0 is ?, unknown shape.
2021-04-09 20:57:32.434034: W tf_adapter/util/infershape_util.cc:337] The shape of node gradients/Lenet/dropout8/dropout/mul_grad/Reshape output 0 is [?,84], unknown shape.
2021-04-09 20:57:32.434057: W tf_adapter/util/infershape_util.cc:337] The shape of node gradients/Lenet/fc7/Relu_grad/ReluGrad output 0 is [?,84], unknown shape.
2021-04-09 20:57:32.434086: W tf_adapter/util/infershape_util.cc:337] The shape of node gradients/Lenet/fc7/MatMul_grad/MatMul output 0 is [?,120], unknown shape.
2021-04-09 20:57:32.434105: W tf_adapter/util/infershape_util.cc:337] The shape of node gradients/Lenet/fc7/MatMul_grad/MatMul_1 output 0 is [?,84], unknown shape.
2021-04-09 20:57:32.434304: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node Adam is null.
step 100, training accuracy 0.12
step 200, training accuracy 0.34
step 300, training accuracy 0.36
step 400, training accuracy 0.48
step 500, training accuracy 0.36
step 600, training accuracy 0.38
step 700, training accuracy 0.32
step 800, training accuracy 0.44
step 900, training accuracy 0.34
step 1000, training accuracy 0.4
step 1100, training accuracy 0.42
step 1200, training accuracy 0.44
step 1300, training accuracy 0.56
step 1400, training accuracy 0.32
step 1500, training accuracy 0.56
step 1600, training accuracy 0.56
step 1700, training accuracy 0.6
step 1800, training accuracy 0.54
step 1900, training accuracy 0.52
step 2000, training accuracy 0.66
step 2100, training accuracy 0.6
step 2200, training accuracy 0.56
step 2300, training accuracy 0.64
step 2400, training accuracy 0.58
step 2500, training accuracy 0.66
step 2600, training accuracy 0.62
step 2700, training accuracy 0.7
step 2800, training accuracy 0.58
step 2900, training accuracy 0.7
step 3000, training accuracy 0.62
step 3100, training accuracy 0.78
step 3200, training accuracy 0.8
step 3300, training accuracy 0.68
step 3400, training accuracy 0.8
step 3500, training accuracy 0.72
step 3600, training accuracy 0.64
step 3700, training accuracy 0.74
step 3800, training accuracy 0.7
step 3900, training accuracy 0.74
step 4000, training accuracy 0.86
step 4100, training accuracy 0.72
step 4200, training accuracy 0.76
step 4300, training accuracy 0.78
step 4400, training accuracy 0.82
step 4500, training accuracy 0.9
step 4600, training accuracy 0.76
step 4700, training accuracy 0.8
step 4800, training accuracy 0.72
step 4900, training accuracy 0.76
2021-04-09 20:58:00.079502: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node _SOURCE is null.
2021-04-09 20:58:00.079556: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node _SINK is null.
checkpoint/variable.ckpt
[Modelarts Service Log]2021-04-09 20:58:08,026 - INFO - Begin destroy training processes
[Modelarts Service Log]2021-04-09 20:58:08,026 - INFO - proc-rank-0-device-0 (pid: 140) has exited
[Modelarts Service Log]2021-04-09 20:58:08,027 - INFO - End destroy training processes
[Modelarts Service Log]2021-04-09 20:58:08,063 - INFO - final upload stdout log done
[ModelArts Service Log]modelarts-pipe: total length: 25587
[Modelarts Service Log]Training end with return code: 0
[Modelarts Service Log]upload ascend-log to s3://cann-lenet/MA-LeNet-04-08-13-28/log/ascend-log/ at 2021-04-09-20:58:08
upload_tail_log.py -l 2048 -o s3://cann-lenet/MA-LeNet-04-08-13-28/log/ascend-log/
[Modelarts Service Log]upload ascend-log end at 2021-04-09-20:58:09
[Modelarts Service Log]handle outputs of training job
[ModelArts Service Log]modelarts-pipe: will create log file /tmp/log/MA-LeNet-04-08-13-28.log
[ModelArts Service Log]modelarts-pipe: will write log file /tmp/log/MA-LeNet-04-08-13-28.log
[ModelArts Service Log]modelarts-pipe: param for max log length: 1073741824
[ModelArts Service Log]modelarts-pipe: param for whether exit on overflow: 0
[Modelarts Service Log][modelarts_logger] modelarts-pipe found
INFO:root:Using MoXing-v1.17.3-8aa951bc
INFO:root:Using OBS-Python-SDK-3.20.7
[ModelArts Service Log][INFO][2021/04/09 20:58:09]: env MA_OUTPUTS is not found, skip the outputs handler
[ModelArts Service Log]modelarts-pipe: total length: 184
[Modelarts Service Log]Training end at 2021-04-09-20:58:09
[Modelarts Service Log]Training completed.


点赞 评论 引用 举报

采纳成功

您已采纳当前回复为最佳回复

张辉

发帖: 137粉丝: 92

发消息 + 关注

发表于2021年04月09日 21:18:37
直达本楼层的链接
8#
显示全部楼层

张小白将epoch改为50000再做一次。

这回应该是调到NPU了。。

image.png

结果截图如下:

image.png

image.png


点赞 评论 引用 举报

采纳成功

您已采纳当前回复为最佳回复

钟林

发帖: 1粉丝: 0

发消息 + 关注

发表于2021年04月15日 10:51:49
直达本楼层的链接
9#
显示全部楼层
很小白式的过程步骤,学习新东西时大家最希望看到的就是这个,突然想起我刚开始学时的样子。 感谢辉哥无私贡献,点赞!
点赞 评论 引用 举报

游客

富文本
Markdown
您需要登录后才可以回帖 登录 | 立即注册

结贴

您对问题的回复是否满意?
满意度
非常满意 满意 一般 不满意
我要反馈
0/200