建议使用以下浏览器,以获得最佳体验。 IE 9.0+以上版本 Chrome 31+ 谷歌浏览器 Firefox 30+ 火狐浏览器
请选择 进入手机版 | 继续访问电脑版
设置昵称

在此一键设置昵称,即可参与社区互动!

确定
我再想想
选择版块
标签
您还可以添加5个标签
  • 没有搜索到和“关键字”相关的标签
  • 云产品
  • 解决方案
  • 技术领域
  • 通用技术
  • 平台功能
取消

sharpcat

发帖: 30粉丝: 0

级别 : 新手上路

发消息 + 关注

更新于2020年10月05日 16:45:38 196 4
直达本楼层的链接
楼主
显示全部楼层
[训练作业] 【ModelArts产品】我想在modelarts上训练yolov3resnet18

【操作步骤&问题现象】

我从这个地址下载了代码https://gitee.com/mindspore/mindspore/tree/r0.5/model_zoo/yolov3_resnet18,想要在modelarts上用“常用框加”而不是直接从“市场订阅”那训练作业。下面是我的报错,启动文件我直接用的代码里的train.py,不知道是否得修改。感谢专家指点入门


【截图信息】

image.png

image.png

【日志信息】(可选,上传日志内容或者附件)

do nothing

[Modelarts Service Log]user: uid=1101(work) gid=1101(work) groups=1101(work),1000(HwHiAiUser)

[Modelarts Service Log]pwd: /home/work

[Modelarts Service Log]app_url: s3://knife/yolov3_resnet18/

[Modelarts Service Log]boot_file: yolov3_resnet18/train.py

[Modelarts Service Log]log_url: /tmp/log/trainjob-yolov3.log

[Modelarts Service Log]command: yolov3_resnet18/train.py --epoch_size=100 --lr=0.001 --image_dir=/knife/yolov3_resnet18/dataset/train2017/ --anno_path=/knife/yolov3_resnet18/dataset/annotations/instances_train.json --data_url=s3://knife/yolov3_resnet18/dataset/ --train_url=s3://knife/yolov3out/V0003/

[Modelarts Service Log]local_code_dir: 

[Modelarts Service Log][modelarts_create_log] modelarts-pipe found

[Modelarts Service Log]handle inputs of training job

INFO:root:Using MoXing-v1.16.1-

INFO:root:Using OBS-Python-SDK-3.1.2

[ModelArts Service Log][INFO][2020/10/05 16:39:29]: env MA_INPUTS is not found, skip the inputs handler

INFO:root:Using MoXing-v1.16.1-

INFO:root:Using OBS-Python-SDK-3.1.2

[ModelArts Service Log]2020-10-05 16:39:30,219 - modelarts-downloader.py[line:612] - INFO: Main: modelarts-downloader starting with Namespace(dst='./', recursive=True, skip_creating_dir=False, src='s3://knife/yolov3_resnet18/', trace=False, type='common', verbose=False)

[Modelarts Service Log][modelarts_logger] modelarts-pipe found

[Modelarts Service Log][INFO] exec pip install

Looking in indexes: http://repo.myhuaweicloud.com/repository/pypi/simple

Requirement already satisfied: numpy<=1.17.5,>=1.17.0 in /usr/local/ma/python3.7/lib/python3.7/site-packages (from -r /home/work/user-job-dir/yolov3_resnet18/pip-requirements.txt (line 1)) (1.17.5)

Requirement already satisfied: protobuf>=3.8.0 in /usr/local/ma/python3.7/lib/python3.7/site-packages (from -r /home/work/user-job-dir/yolov3_resnet18/pip-requirements.txt (line 2)) (3.11.3)

Requirement already satisfied: asttokens>=1.1.13 in /usr/local/ma/python3.7/lib/python3.7/site-packages (from -r /home/work/user-job-dir/yolov3_resnet18/pip-requirements.txt (line 3)) (2.0.4)

Requirement already satisfied: pillow>=6.2.0 in /usr/local/ma/python3.7/lib/python3.7/site-packages (from -r /home/work/user-job-dir/yolov3_resnet18/pip-requirements.txt (line 4)) (7.0.0)

Requirement already satisfied: scipy>=1.3.3 in /usr/local/ma/python3.7/lib/python3.7/site-packages (from -r /home/work/user-job-dir/yolov3_resnet18/pip-requirements.txt (line 5)) (1.3.3)

Requirement already satisfied: easydict>=1.9 in /usr/local/ma/python3.7/lib/python3.7/site-packages (from -r /home/work/user-job-dir/yolov3_resnet18/pip-requirements.txt (line 6)) (1.9)

Requirement already satisfied: sympy>=1.4 in /usr/local/ma/python3.7/lib/python3.7/site-packages (from -r /home/work/user-job-dir/yolov3_resnet18/pip-requirements.txt (line 7)) (1.4)

Requirement already satisfied: cffi>=1.13.2 in /usr/local/ma/python3.7/lib/python3.7/site-packages (from -r /home/work/user-job-dir/yolov3_resnet18/pip-requirements.txt (line 8)) (1.14.0)

Collecting wheel>=0.32.0 (from -r /home/work/user-job-dir/yolov3_resnet18/pip-requirements.txt (line 9))

  Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/a7/00/3df031b3ecd5444d572141321537080b40c1c25e1caa3d86cdd12e5e919c/wheel-0.35.1-py2.py3-none-any.whl

Requirement already satisfied: decorator>=4.4.0 in /usr/local/ma/python3.7/lib/python3.7/site-packages (from -r /home/work/user-job-dir/yolov3_resnet18/pip-requirements.txt (line 10)) (4.4.1)

Requirement already satisfied: setuptools>=40.8.0 in /usr/local/ma/python3.7/lib/python3.7/site-packages (from -r /home/work/user-job-dir/yolov3_resnet18/pip-requirements.txt (line 11)) (41.2.0)

Collecting matplotlib>=3.1.3 (from -r /home/work/user-job-dir/yolov3_resnet18/pip-requirements.txt (line 12))

  Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/2b/4c/fe4b36325795524f35d39edc390c89584e9a901df9e615df6f5effddaa0e/matplotlib-3.3.2.tar.gz (37.9MB)

Collecting opencv-python>=4.2.0.32 (from -r /home/work/user-job-dir/yolov3_resnet18/pip-requirements.txt (line 13))

  Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/38/a9/cd39fd25df434b5d9451dc266c12b72f68282a2b9bd5d7b4aa2d57d6c20e/opencv-python-4.4.0.44.tar.gz (88.9MB)

  Installing build dependencies: started

  Installing build dependencies: finished with status 'error'

  ERROR: Command errored out with exit status 1:

   command: /usr/local/ma/python3.7/bin/python3.7 /usr/local/ma/python3.7/lib/python3.7/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-l59_gym7/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i http://repo.myhuaweicloud.com/repository/pypi/simple --trusted-host repo.myhuaweicloud.com -- setuptools wheel scikit-build cmake pip 'numpy==1.11.3; python_version=='"'"'3.5'"'"'' 'numpy==1.13.3; python_version=='"'"'3.6'"'"'' 'numpy==1.14.5; python_version=='"'"'3.7'"'"'' 'numpy==1.17.3; python_version>='"'"'3.8'"'"''

       cwd: None

  Complete output (23 lines):

  Ignoring numpy: markers 'python_version == "3.5"' don't match your environment

  Ignoring numpy: markers 'python_version == "3.6"' don't match your environment

  Ignoring numpy: markers 'python_version >= "3.8"' don't match your environment

  Looking in indexes: http://repo.myhuaweicloud.com/repository/pypi/simple

  Collecting setuptools

    Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/44/a6/7fb6e8b3f4a6051e72e4e2218889351f0ee484b9ee17e995f5ccff780300/setuptools-50.3.0-py3-none-any.whl (785kB)

  Collecting wheel

    Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/a7/00/3df031b3ecd5444d572141321537080b40c1c25e1caa3d86cdd12e5e919c/wheel-0.35.1-py2.py3-none-any.whl

  Collecting scikit-build

    Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/78/c9/7c2c7397ea64e36ebb292446896edcdecbb8c1aa6b9a1a32f6f67984c3df/scikit_build-0.11.1-py2.py3-none-any.whl (72kB)

  Collecting cmake

    Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/eb/0a/039d5e4c4e2cf347091fe0e3ee322413e3750a5d4bd1d4b6d8537072687a/cmake-3.18.2.post1.tar.gz

      ERROR: Command errored out with exit status 1:

       command: /usr/local/ma/python3.7/bin/python3.7 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-jqi4wvpw/cmake/setup.py'"'"'; __file__='"'"'/tmp/pip-install-jqi4wvpw/cmake/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base pip-egg-info

           cwd: /tmp/pip-install-jqi4wvpw/cmake/

      Complete output (5 lines):

      Traceback (most recent call last):

        File "<string>", line 1, in <module>

        File "/tmp/pip-install-jqi4wvpw/cmake/setup.py", line 7, in <module>

          from skbuild import setup

      ModuleNotFoundError: No module named 'skbuild'

      ----------------------------------------

  ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

  ----------------------------------------

ERROR: Command errored out with exit status 1: /usr/local/ma/python3.7/bin/python3.7 /usr/local/ma/python3.7/lib/python3.7/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-l59_gym7/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i http://repo.myhuaweicloud.com/repository/pypi/simple --trusted-host repo.myhuaweicloud.com -- setuptools wheel scikit-build cmake pip 'numpy==1.11.3; python_version=='"'"'3.5'"'"'' 'numpy==1.13.3; python_version=='"'"'3.6'"'"'' 'numpy==1.14.5; python_version=='"'"'3.7'"'"'' 'numpy==1.17.3; python_version>='"'"'3.8'"'"'' Check the logs for full command output.

/home/work/user-job-dir

[Modelarts Service Log][modelarts_logger] modelarts-pipe found

[Modelarts Service Log]2020-10-05 16:39:52,248 - INFO - Davinci training command

[Modelarts Service Log]2020-10-05 16:39:52,248 - INFO - ['/usr/bin/python', '/home/work/user-job-dir/yolov3_resnet18/train.py', '--epoch_size=100', '--lr=0.001', '--image_dir=/knife/yolov3_resnet18/dataset/train2017/', '--anno_path=/knife/yolov3_resnet18/dataset/annotations/instances_train.json', '--data_url=s3://knife/yolov3_resnet18/dataset/', '--train_url=s3://knife/yolov3out/V0003/']

[Modelarts Service Log]2020-10-05 16:39:52,248 - INFO - Wait for Rank table file ready

[Modelarts Service Log]2020-10-05 16:39:52,248 - INFO - Rank table file (K8S generated) is ready for read

[Modelarts Service Log]2020-10-05 16:39:52,248 - INFO - 

{

    "status": "completed",

    "group_count": "1",

    "group_list": [

        {

            "group_name": "job-trainjob-yolov3",

            "device_count": "1",

            "instance_count": "1",

            "instance_list": [

                {

                    "pod_name": "job33b33459-job-trainjob-yolov3-0",

                    "server_id": "192.168.0.81",

                    "devices": [

                        {

                            "device_id": "0",

                            "device_ip": "192.1.248.250"

                        }

                    ]

                }

            ]

        }

    ]

}

[Modelarts Service Log]2020-10-05 16:39:52,249 - INFO - Rank table file (C7x)

[Modelarts Service Log]2020-10-05 16:39:52,249 - INFO - 

{

    "status": "completed",

    "version": "1.0",

    "server_count": "1",

    "server_list": [

        {

            "server_id": "192.168.0.81",

            "device": [

                {

                    "device_id": "0",

                    "device_ip": "192.1.248.250",

                    "rank_id": "0"

                }

            ]

        }

    ]

}

[Modelarts Service Log]2020-10-05 16:39:52,249 - INFO - Rank table file (C7x) is generated

[Modelarts Service Log]2020-10-05 16:39:52,249 - INFO - Slogd startup

[Modelarts Service Log]2020-10-05 16:39:52,251 - INFO - Current server

[Modelarts Service Log]2020-10-05 16:39:52,252 - INFO - 

{

    "server_id": "192.168.0.81",

    "device": [

        {

            "device_id": "0",

            "device_ip": "192.1.248.250",

            "rank_id": "0"

        }

    ]

}

[Modelarts Service Log]2020-10-05 16:39:52,252 - INFO - FMK of device0 startup

Start create dataset!

image_dir or anno_path not exits.

Traceback (most recent call last):

  File "/home/work/user-job-dir/yolov3_resnet18/train.py", line 164, in <module>

    main()

  File "/home/work/user-job-dir/yolov3_resnet18/train.py", line 127, in main

    batch_size=args_opt.batch_size, device_num=device_num, rank=rank)

  File "/cache/user-job-dir/yolov3_resnet18/src/dataset.py", line 297, in create_yolo_dataset

    num_parallel_workers=num_parallel_workers, shuffle=is_training)

  File "/usr/local/ma/python3.7/lib/python3.7/site-packages/mindspore/dataset/engine/validators.py", line 471, in new_method

    check_dataset_file(dataset_file)

  File "/usr/local/ma/python3.7/lib/python3.7/site-packages/mindspore/dataset/engine/validators.py", line 137, in check_dataset_file

    raise ValueError("The file {} does not exist or permission denied!".format(dataset_file))

ValueError: The file ./Mindrecord_train/yolo.mindrecord0 does not exist or permission denied!

[Modelarts Service Log]2020-10-05 16:39:55,259 - ERROR - FMK of device0 (pid: [156]) has exited with non-zero code: 1

[Modelarts Service Log]2020-10-05 16:39:55,259 - INFO - Begin destroy FMK processes

[Modelarts Service Log]2020-10-05 16:39:55,259 - INFO - FMK of device0 (pid: [156]) has exited

[Modelarts Service Log]2020-10-05 16:39:55,259 - INFO - End destroy FMK processes

=== begin proc exit    ===

=== begin stop slogd   ===

===   end pro exit     ===

[Modelarts Service Log]Training end with return code: 1

[Modelarts Service Log]Training completed.


train.py内容.txt 8.65 KB,下载次数:1

举报
分享

分享文章到朋友圈

分享文章到微博

JeffDing

发帖: 20粉丝: 12

级别 : 中级会员

发消息 + 关注

更新于2020年10月06日 14:29:19
直达本楼层的链接
沙发
显示全部楼层

Mindspore代码要在华为云 ModelArts上的训练作业里运行的话需要修改一下代码,例如引入moxing库,把数据文件从OBS桶里拷贝到环境里方便程序调用。

在华为云上使用Mindspore可以参考这一篇文档:https://www.mindspore.cn/tutorial/training/zh-CN/master/advanced_use/use_on_the_cloud.html

YoloV3ResNet18这个ModelZoo是配合coco数据集进行训练的,不知道你的数据集是用的什么。


有关Mindspore 0.5训练YOLOV3,Mindspore集训营第三期的时候有讲到过具体的操作。你可以看看那一段视频

视频地址:https://www.bilibili.com/video/BV1nf4y1973u


评论
sharpcat 2020-10-6 12:45 评论

您好,我用的数据集是coco2017格式,您上面那个链接点开404了

... 查看全部
JeffDing 2020-10-6 14:31 评论

评论 sharpcat:链接更新过了,你再看看那。COCO2017的话先需要生成预训练文件,然后再用预训练文件训练。其实还是建议看视频,视频讲的很清楚的

... 查看全部
sharpcat 2020-10-6 15:35 评论

评论 JeffDing:好的,谢谢!

... 查看全部
点赞 评论 引用 举报

HWCloudAI

发帖: 152粉丝: 195

级别 : 管理员

发消息 + 关注

发表于2020年10月07日 14:10:43
直达本楼层的链接
板凳
显示全部楼层

你好,问题已收到,已安排专家回复,请耐心等待。

点赞 评论 引用 举报

sharpcat

发帖: 30粉丝: 0

级别 : 新手上路

发消息 + 关注

发表于2020年10月07日 16:59:37
直达本楼层的链接
地板
显示全部楼层

请问一下anno压缩包是什么意思呢,我的数据集:

dataset

      annotations

           instances_train2017.json

           instances_val2017.json

      train2017

           L_10000.jpg

           ... ...

      val2017

           L_10001.jpg

      train.txt


点赞 评论 引用 举报

JeffDing

发帖: 20粉丝: 12

级别 : 中级会员

发消息 + 关注

更新于2020年10月07日 19:05:14
直达本楼层的链接
5#
显示全部楼层

回复:sharpcat 发表于 2020-10-7 16:59请问一下anno压缩包是什么意思呢,我的数据集:dataset annotations &nb

coco数据集一共有五种标注类型,分别:目标检测, 关键点检测,素材分割, 全景分割, 图像说明等5种类型;标注信息使用 JSON 格式存储( annotations ), 预处理通过COCO API用于访问和操作所有“标注”

reference: https://blog.csdn.net/u013832707/article/details/93710810

2017 Train images [18G](train2017.zip)

2017 Val image [1G] (val2017.zip)

2017 Train/Val annotations [241MB] 这些图片对应的标注信息应用: (annotations_trainval2017.zip)

instances: 目标检测

captions: 图像描述

person_keypoints: 人关键点检测


在modelzoo这个实例中anno_file里面的文件会在执行train.py遇到没有mindrecode文件的情况下需要生成mindrecode的时候会用到。


点赞 评论 引用 举报

游客

富文本
Markdown
您需要登录后才可以回帖 登录 | 立即注册