作者小头像 Lv.1
13 成长值

个人介绍

这个人很懒,什么都没有留下

感兴趣或擅长的领域

暂无数据
个人勋章
TA还没获得勋章~
成长雷达
10
3
0
0
0

个人资料

个人介绍

这个人很懒,什么都没有留下

感兴趣或擅长的领域

暂无数据

达成规则

发布时间 2022/09/28 20:53:41 最后回复 chengxiaoli 2022/10/11 14:57:18 版块 MindSpore
97 4 0
他的回复:
具体的报错信息:报错信息显示,p2p连接超时了 [WARNING] PRE_ACT(56864,ffffb8d23a40,python):2022-09-29-10:03:31.439.039 [mindspore/ccsrc/backend/common/pass/communication_op_fusion.cc:198] GetAllReduceSplitSegment] Split threshold is 0. AllReduce nodes will take default fusion strategy. [CRITICAL] GE(56864,ffffb8d23a40,python):2022-09-29-10:05:56.138.935 [mindspore/ccsrc/plugin/device/ascend/hal/device/ge_runtime/task/hccl_task.cc:100] Distribute] davinci_model : load task fail, return ret: 1343225860 [CRITICAL] DEVICE(56864,ffffb8d23a40,python):2022-09-29-10:05:56.139.477 [mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:567] LoadTask] Distribute Task Failed,  error msg: mindspore/ccsrc/plugin/device/ascend/hal/device/ge_runtime/task/hccl_task.cc:100 Distribute] davinci_model : load task fail, return ret: 1343225860 [ERROR] DEVICE(56864,ffffb8d23a40,python):2022-09-29-10:05:56.139.597 [mindspore/ccsrc/plugin/device/ascend/hal/hardware/ascend_device_context.cc:660] ReportErrorMessage] Ascend error occurred, error message: EI9999: Inner Error! EI9999  connected p2p timeout, timeout:120 s.local logicDevid:0,remote physic id:4 The possible causes are as follows:1.the connectionbetween this device and the target device is abnormal 2.an exception occurredat the target devices 3.The ranktable is not matched.[FUNC:WaitP2PConnected][FILE:p2p_mgmt.cc][LINE:228]  [CRITICAL] DEVICE(56864,ffffb8d23a40,python):2022-09-29-10:05:56.139.626 [mindspore/ccsrc/plugin/device/ascend/hal/hardware/ascend_device_context.cc:422] PreprocessBeforeRunGraph] Preprocess failed before run graph 0,  error msg: mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:567 LoadTask] Distribute Task Failed,  error msg: mindspore/ccsrc/plugin/device/ascend/hal/device/ge_runtime/task/hccl_task.cc:100 Distribute] davinci_model : load task fail, return ret: 1343225860 Traceback (most recent call last):   File "Distribute_Ascend_Mobilevit_train.py", line 121, in      MobileViT_train(args)   File "Distribute_Ascend_Mobilevit_train.py", line 103, in MobileViT_train     dataset_sink_mode=False)   File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/train/model.py", line 906, in train     sink_size=sink_size)   File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/train/model.py", line 87, in wrapper     func(self, *args, **kwargs)   File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/train/model.py", line 542, in _train     self._train_process(epoch, train_dataset, list_callback, cb_params)   File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/train/model.py", line 794, in _train_process     outputs = self._train_network(*next_element)   File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/nn/cell.py", line 586, in __call__     out = self.compile_and_run(*args)   File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/nn/cell.py", line 964, in compile_and_run     self.compile(*inputs)   File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/nn/cell.py", line 937, in compile     _cell_graph_executor.compile(self, *inputs, phase=self.phase, auto_parallel_mode=self._auto_parallel_mode)   File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/common/api.py", line 1006, in compile     result = self._graph_executor.compile(obj, args_list, phase, self._use_vm_mode()) RuntimeError: mindspore/ccsrc/plugin/device/ascend/hal/hardware/ascend_device_context.cc:422 PreprocessBeforeRunGraph] Preprocess failed before run graph 0,  error msg: mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:567 LoadTask] Distribute Task Failed,  error msg: mindspore/ccsrc/plugin/device/ascend/hal/device/ge_runtime/task/hccl_task.cc:100 Distribute] davinci_model : load task fail, return ret: 1343225860 [ERROR] MD(56864,fffec1ffb1e0,python):2022-09-29-10:06:01.350.188 [mindspore/ccsrc/minddata/dataset/util/task_manager.cc:217] InterruptMaster] Task is terminated with err msg(more detail in info level log):Exception thrown from PyFunc. The actual amount of data read from generator 460 is different from generator.len 160146, you should adjust generator.len to make them match. Line of code : 217 File         : /home/jenkins/agent-working-dir/workspace/Compile_Ascend_ARM_CentOS/mindspore/mindspore/ccsrc/minddata/dataset/engine/datasetops/source/generator_op.cc