华为昇腾AscendHub能力镜像操作指南
【摘要】 AscendHub昇腾镜像仓库是昇腾开放Docker镜像仓库,提供昇腾软件Docker镜像,包括MindX DL镜像,推理镜像,训练镜像,操作系统,其他等,支持用户快速部署昇腾基础软件和应用软件。
一、参考资料
二、相关介绍
AscendHub昇腾镜像仓库是昇腾开放Docker镜像仓库,提供昇腾软件Docker镜像,包括MindX DL镜像,推理镜像,训练镜像,操作系统,其他等,支持用户快速部署昇腾基础软件和应用软件。
三、关键步骤
3.1 SSH连接Ascend昇腾服务器
3.2 下载AscendHub镜像
3.2.1 筛选合适的镜像
以 ascend-infer-arm 为例。
3.2.2 下载镜像
获取登录访问权限,并复制到节点执行
[root@10 ~]# docker login -u xxx -p xxx
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Login Succeeded
下载镜像
[root@10 YOYOFile]# docker pull ascendhub.huawei.com/public-ascendhub/ascend-infer-arm:21.0.1
21.0.1: Pulling from public-ascendhub/ascend-infer-arm
04da93b342eb: Pull complete
b235194751de: Pull complete
606a67bb8db9: Pull complete
95aa229ab29f: Pull complete
4f15829f025d: Pull complete
17b4b84ba418: Pull complete
8234571b9fec: Pull complete
a243aec954cf: Pull complete
753e401d871f: Pull complete
670e8fb912d7: Pull complete
24e2397ae9a7: Pull complete
40ccaece18cf: Pull complete
1da5f59e5e62: Pull complete
b78de6fe904b: Pull complete
1343d2141111: Pull complete
Digest: sha256:2a2c330515312a5dae9b65cd5c81a3e2b3cab7044336550e1703f1e68762c5b9
Status: Downloaded newer image for ascendhub.huawei.com/public-ascendhub/ascend-infer-arm:21.0.1
ascendhub.huawei.com/public-ascendhub/ascend-infer-arm:21.0.1
3.3 运行Docker容器
docker run -it --ipc=host \
--name ascend-infer-arm-2 \
-p 6019:22 \
-e ASCEND_VISIBLE_DEVICES=3 \
--device=/dev/davinci3 \
--device=/dev/davinci_manager \
--device=/dev/devmm_svm \
--device=/dev/hisi_hdc \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \
-v /root/YOYOFile/work:/root/work \
ascendhub.huawei.com/public-ascendhub/ascend-infer-arm:21.0.1 \
/bin/bash
参数说明
- –ipc=host,容器共享宿主机的内存。
- –name,容器的名称。
- -p,端口映射,宿主机的 6012 端口和映射到docker容器22端口。
- -e,使用ASCEND_VISIBLE_DEVICES环境变量指定被挂载至容器中的NPU设备。
- –device,NPU设备和管理设备(如/dev/davinci0、/dev/davinci_manager、/dev/hisi_hdc、/dev/devmm_svm)。
- -v
/usr/local/Ascend/driver
映射物理机的Ascend驱动到docker容器。 - -v
npu-smi
映射npu-smi指令到docker容器。 - -v
/work
工作目录,模型转换和推理要用的代码目录映射到docker容器。 - tag为镜像版本号,如21.0.1。
注意:==宿主机的 6012 端口和docker容器抛出的 22 端口做了映射,访问宿主机的6012端口也即直接访问了docker容器中 22 端口。如果宿主机是一个云主机或者云服务器,只要它有公网ip,就可以通过 <kbd>宿主机ip:6012</kbd> 就可以访问docker容器==。
查看docker容器
==宿主机 0.0.0.0:6012 映射docker容器22端口==。
docker ps -a
3.4 容器中安装ssh
安装ssh
# centos
yum -y install openssh*
# ubuntu
apt install passwd openssl openssh-server openssh-client -y
root@6c2955252e11:/tmp# apt install passwd openssl openssh-server openssh-client -y
Reading package lists... Done
Building dependency tree
Reading state information... Done
openssl is already the newest version (1.1.1-1ubuntu2.1~18.04.17).
Suggested packages:
keychain libpam-ssh monkeysphere ssh-askpass molly-guard rssh ufw
The following packages will be upgraded:
openssh-client openssh-server openssh-sftp-server passwd
4 upgraded, 0 newly installed, 0 to remove and 14 not upgraded.
Need to get 1617 kB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 http://ports.ubuntu.com/ubuntu-ports bionic-updates/main arm64 passwd arm64 1:4.5-1ubuntu2.2 [762 kB]
Get:2 http://ports.ubuntu.com/ubuntu-ports bionic-updates/main arm64 openssh-sftp-server arm64 1:7.6p1-4ubuntu0.7 [38.9 kB]
Get:3 http://ports.ubuntu.com/ubuntu-ports bionic-updates/main arm64 openssh-server arm64 1:7.6p1-4ubuntu0.7 [291 kB]
Get:4 http://ports.ubuntu.com/ubuntu-ports bionic-updates/main arm64 openssh-client arm64 1:7.6p1-4ubuntu0.7 [525 kB]
Fetched 1617 kB in 9s (188 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
(Reading database ... 20171 files and directories currently installed.)
Preparing to unpack .../passwd_1%3a4.5-1ubuntu2.2_arm64.deb ...
Unpacking passwd (1:4.5-1ubuntu2.2) over (1:4.5-1ubuntu2) ...
Setting up passwd (1:4.5-1ubuntu2.2) ...
(Reading database ... 20171 files and directories currently installed.)
Preparing to unpack .../openssh-sftp-server_1%3a7.6p1-4ubuntu0.7_arm64.deb ...
Unpacking openssh-sftp-server (1:7.6p1-4ubuntu0.7) over (1:7.6p1-4ubuntu0.6) ...
Preparing to unpack .../openssh-server_1%3a7.6p1-4ubuntu0.7_arm64.deb ...
Unpacking openssh-server (1:7.6p1-4ubuntu0.7) over (1:7.6p1-4ubuntu0.6) ...
Preparing to unpack .../openssh-client_1%3a7.6p1-4ubuntu0.7_arm64.deb ...
Unpacking openssh-client (1:7.6p1-4ubuntu0.7) over (1:7.6p1-4ubuntu0.6) ...
Setting up openssh-client (1:7.6p1-4ubuntu0.7) ...
Setting up openssh-sftp-server (1:7.6p1-4ubuntu0.7) ...
Setting up openssh-server (1:7.6p1-4ubuntu0.7) ...
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (Can't locate Term/ReadLine.pm in @INC (you may need to install the Term::ReadLine module) (@INC contains: /etc/perl /usr/local/lib/aarch64-linux-gnu/perl/5.26.1 /usr/local/share/perl/5.26.1 /usr/lib/aarch64-linux-gnu/perl5/5.26 /usr/share/perl5 /usr/lib/aarch64-linux-gnu/perl/5.26 /usr/share/perl/5.26 /usr/local/lib/site_perl /usr/lib/aarch64-linux-gnu/perl-base) at /usr/share/perl5/Debconf/FrontEnd/Readline.pm line 7.)
debconf: falling back to frontend: Teletype
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of restart.
Processing triggers for systemd (237-3ubuntu10.53) ...
设置采用密码登录
vi /etc/ssh/sshd_config
#添加下面两行
PermitRootLogin yes
PasswordAuthentication yes
==注意==:是sshd_config 而不是ssh_config,一定不能混淆。
设置密码
修改容器中root账户的密码。
passwd
root@6c2955252e11:/tmp# passwd
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
重启sshd
service ssh start
root@6c2955252e11:/tmp# service ssh start
* Starting OpenBSD Secure Shell server sshd
查看sshd运行状态
service ssh status
root@6c2955252e11:/tmp# service ssh status
* sshd is running
3.5 (可选)宿主机添加端口放行
==如果服务器中已开启防火墙,需要添加端口放行。如果未开启,该步骤可忽略。==
iptables -A INPUT -p tcp --dport 6012 -j ACCEPT
# 保存配置
service iptables save
# 重启iptables
service iptables restart
systemctl restart iptables
# 查看iptables是否生效
service iptables status
systemctl status iptables
[root@10 ~]# service iptables stop
Redirecting to /bin/systemctl stop iptables.service
3.6 远程SSH连接容器
ssh root@ip -p 6012
# port端口是宿主机(服务器)的port,而不是docker容器暴露的port,千万不要搞混了
3.7 测试atc模型转换
下载模型文件
# 下载
wget https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/turing/resourcecenter/model/Resnet50V1/zh/1.1/ATC_Resnet50_V1_from_Tensorflow_Ascend310.zip
# 解压
unzip ATC_Resnet50_V1_from_Tensorflow_Ascend310.zip
设置环境变量
source /usr/local/Ascend/nnrt/set_env.sh
或者
source /usr/local/Ascend/ascend-toolkit/set_env.sh
atc转换模型
atc --model=./resnet_v1_50.pb --framework=3 --output=./resnet50v1_framework_tensorflow_aipp_0_batch_1_input_fp16_output_FP32 --output_type=FP32 --soc_version=Ascend310 --input_shape=input:1,224,224,3
模型转换成功
root@a87e10f43473:/tmp# atc --model=./resnet_v1_50.pb --framework=3 --output=./resnet50v1_framework_tensorflow_aipp_0_batch_1_input_fp16_output_FP32 --output_type=FP32 --soc_version=Ascend310 --input_shape=input:1,224,224,3
ATC start working now, please wait for a moment.
ATC run success, welcome to the next use.
3.8 常用指令
# 查看宿主机的port被docker容器监听
netstat -tunlp
# 在物理机看容器端口映射
docker port container-name 22
# 进入容器
docker exec -it <container_id> /bin/bash
# 查看容器的网络环境
docker inspect <container_id>
# 查看所有容器
docker ps -a
# 查看所有镜像
docker images
# 保存容器为镜像
docker commit 84641e8d0f74 centos_6.6_ssh #容器ID 创建的镜像名
# 宿主机拷贝文件到docker容器
docker cp xxx.zip <container_id>:/tmp
四、FAQ
Q:宿主机与docker镜像中的CANN环境不兼容
root@2a9571160a25:~/work# atc --model=./resnet_v1_50.pb --framework=3 --output=./resnet50v1_framework_tensorflow_aipp_0_batch_1_input_fp16_output_FP32 --output_type=FP32 --soc_version=Ascend310 --input_shape=input:1,224,224,3
DrvMngGetConsoleLogLevel failed. (g_conLogLevel=3)
ATC start working now, please wait for a moment.
DrvMngGetConsoleLogLevel failed. (g_conLogLevel=3)
DrvMngGetConsoleLogLevel failed. (g_conLogLevel=3)
DrvMngGetConsoleLogLevel failed. (g_conLogLevel=3)
DrvMngGetConsoleLogLevel failed. (g_conLogLevel=3)
DrvMngGetConsoleLogLevel failed. (g_conLogLevel=3)
DrvMngGetConsoleLogLevel failed. (g_conLogLevel=3)
DrvMngGetConsoleLogLevel failed. (g_conLogLevel=3)
DrvMngGetConsoleLogLevel failed. (g_conLogLevel=3)
ATC run success, welcome to the next use.
错误原因:
宿主机的driver驱动映射到docker容器中,docker镜像自带nnrt包,nnrt包中的CANN版本与drive驱动不兼容
宿主机与docker镜像中的CANN环境不兼容
解决办法:
1. 启动Docker容器,修改配置
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver
改为
-v /usr/local/Ascend:/usr/local/Ascend
2. 修改环境变量
source /usr/local/Ascend/nnrt/set_env.sh
改为
source /usr/local/Ascend/ascend-toolkit/set_env.sh
Q:Failed to start firewalld - dynamic firewall daemon
最全解决方案:Failed to start firewalld – dynamic firewall daemon
Failed to start firewalld - dynamic firewall daemon.
错误原因:
1. Python
2. firewalld进程的问题。
方法一:
修改以下源码
/usr/sbin/firewalld
/usr/bin/firewall-cmd
#!/usr/bin/python -Es
改为
#!/usr/bin/python2
方法二:
如果方法一失效,请尝试方法二
systemctl stop firewalld
pkill -f firewalld
systemctl start firewalld
systemctl status firewalld
Q:找不到/etc/sysconfig/iptables
The service command supports only basic LSB actions (start, stop, restart, try-restart, reload, force-reload, status). For other actions, please try to use systemctl.
解决办法:
1. 安装或更新
yum install iptables-services
2. 保存配置
service iptables save
3. 执行完毕之后/etc/syscofig/iptables文件就有了
Q:运行docker失败
docker: Error response from daemon: driver failed programming external connectivity on endpoint ascend-infer-arm-1 (1c508946fd80fe24a2bcb62f63fbd08e3a98f7834a42660c368cb3a0609fec80): (iptables failed: iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 6018 -j DNAT --to-destination 172.17.0.3:22 ! -i docker0: iptables: No chain/target/match by that name.
(exit status 1)).
ERRO[0000] error waiting for container: context canceled
解决办法:
重启docker
systemctl restart docker
【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱:
cloudbbs@huaweicloud.com
- 点赞
- 收藏
- 关注作者
评论(0)