cuda-10.0、cudnn-v7.5.环境,安装TensorFlow-1.15.0、pytorch==1.2.0
【摘要】 最近研发需要TensorFlow和pytorch环境。我开始安装tf==2.4.0/ 2.6.0/,运行出现:Non-OK-status: GpuLaunchKernel(ShuffleInTensor3Simple问题。分析是tf 版本过高的问题,所以我降低tf版本。根据 https://www.cnblogs.com/ywb123/p/16631780.html 介绍的cuda、cudn...
最近研发需要TensorFlow和pytorch环境。
我开始安装tf==2.4.0/ 2.6.0/,运行出现:Non-OK-status: GpuLaunchKernel(ShuffleInTensor3Simple问题。
分析是tf 版本过高的问题,所以我降低tf版本。
根据 https://www.cnblogs.com/ywb123/p/16631780.html 介绍的cuda、cudnn、TensorFlow、pytorch版本的对应关系。
我的~/.bashrc用的环境如下:
export PATH=path_to/cuda-10.0/bin:${PATH}
export LD_LIBRARY_PATH=path_to/cudnn-10.0-linux-x64-v7.5.0.56/lib64:${LD_LIBRARY_PATH}
export LD_LIBRARY_PATH=path_to/cuda-10.0/lib64:${LD_LIBRARY_PATH}
所以,参考我以前的博客,https://bbs.huaweicloud.com/blogs/240597,我安装tf==1.15.0, pytorch==1.12.0, torchvision==0.4.0
1. conda create -n tf_15 tensorflow-gpu=1.15.0 python=3.7.6
2. pip install torch-1.2.0-cp37-cp37m-manylinux1_x86_64.whl torchvision-0.4.0-cp37-cp37m-manylinux1_x86_64.whl
3. 运行出现,/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by XXX/anaconda3/envs/tf_1_15/lib/python3.7/site-packages/scipy/spatial/ckdtree.cpython-37m-x86_64-linux-gnu.so)
网上查找https://zhuanlan.zhihu.com/p/283537696,是scipy版本过高,降低版本解决。
4. 推理保存h5文件,遇到问题:
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5d.pyx", line 222, in h5py.h5d.DatasetID.write
File "h5py/_proxy.pyx", line 132, in h5py._proxy.dset_rw
File "h5py/_proxy.pyx", line 93, in h5py._proxy.H5PY_H5Dwrite
OSError: Can't write data (file write failed: time = Sat Dec 24 07:39:19 2022
, filename = 'XXXX.h5', file descriptor = 22, errno = 5, error message = 'Input/output error', buf = 0x555a833b7740, total write size = 2923520, bytes this sub-write = 2923520, bytes actually written = 18446744073709551615, offset = 557891852)
Traceback (most recent call last):
File "h5py/_objects.pyx", line 193, in h5py._objects.ObjectID.__dealloc__
原先是保存到挂载的硬盘上,我改变保存路径,到当前服务器的硬盘,重新推理保存h5,没有这个问题了。
【版权声明】本文为华为云社区用户原创内容,未经允许不得转载,如需转载请自行联系原作者进行授权。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱:
cloudbbs@huaweicloud.com
- 点赞
- 收藏
- 关注作者
评论(0)