Distributed package doesn‘t have NCCL built in
Distributed package doesn't have NCCL built in
问题描述:
python在windows环境下dist.init_process_group(backend, rank, world_size)处报错‘RuntimeError: Distributed package doesn’t have NCCL built in’,具体信息如下:
File "D:\Software\Anaconda\Anaconda3\envs\segmenter\lib\site-packages\torch\distributed\distributed_c10d.py", line 531, in init_process_group
timeout=timeout)
File "D:\Software\Anaconda\Anaconda3\envs\segmenter\lib\site-packages\torch\distributed\distributed_c10d.py", line 625, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL "
RuntimeError: Distributed package doesn't have NCCL built in
原因分析:
windows不支持NCCL backend
解决方案:
在dist.init_process_group语句之前添加backend=‘gloo’,也就是在windows中使用GLOO替代NCCL。
————————————————
版权声明:本文为CSDN博主「StarCap」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/StarCap/article/details/120070425
文章来源: blog.csdn.net,作者:AI视觉网奇,版权归原作者所有,如需转载,请联系作者。
原文链接:blog.csdn.net/jacke121/article/details/124652511
- 点赞
- 收藏
- 关注作者
评论(0)