华为昇腾AscendHub能力镜像操作指南

举报
创新中心 发表于 2022/08/16 19:41:31 2022/08/16
【摘要】 AscendHub昇腾镜像仓库是昇腾开放Docker镜像仓库,提供昇腾软件Docker镜像,包括MindX DL镜像,推理镜像,训练镜像,操作系统,其他等,支持用户快速部署昇腾基础软件和应用软件。

一、参考资料

Ubuntu ssh 配置

ssh远程连接docker

通过ssh登录docker容器

二、相关介绍

AscendHub昇腾镜像仓库

AscendHub昇腾镜像仓库是昇腾开放Docker镜像仓库,提供昇腾软件Docker镜像,包括MindX DL镜像,推理镜像,训练镜像,操作系统,其他等,支持用户快速部署昇腾基础软件和应用软件。

三、关键步骤

3.1 SSH连接Ascend昇腾服务器

SSH连接服务器

3.2 下载AscendHub镜像

AscendHub官网

3.2.1 筛选合适的镜像

ascend-infer-arm 为例。

筛选AscendHub镜像

3.2.2 下载镜像

下载AscendHub镜像

获取登录访问权限,并复制到节点执行

[root@10 ~]# docker login -u xxx -p xxx
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded

下载镜像

[root@10 YOYOFile]# docker pull ascendhub.huawei.com/public-ascendhub/ascend-infer-arm:21.0.1
21.0.1: Pulling from public-ascendhub/ascend-infer-arm
04da93b342eb: Pull complete
b235194751de: Pull complete
606a67bb8db9: Pull complete
95aa229ab29f: Pull complete
4f15829f025d: Pull complete
17b4b84ba418: Pull complete
8234571b9fec: Pull complete
a243aec954cf: Pull complete
753e401d871f: Pull complete
670e8fb912d7: Pull complete
24e2397ae9a7: Pull complete
40ccaece18cf: Pull complete
1da5f59e5e62: Pull complete
b78de6fe904b: Pull complete
1343d2141111: Pull complete
Digest: sha256:2a2c330515312a5dae9b65cd5c81a3e2b3cab7044336550e1703f1e68762c5b9
Status: Downloaded newer image for ascendhub.huawei.com/public-ascendhub/ascend-infer-arm:21.0.1
ascendhub.huawei.com/public-ascendhub/ascend-infer-arm:21.0.1

3.3 运行Docker容器

docker run -it --ipc=host \
--name ascend-infer-arm-2 \
-p 6019:22 \
-e ASCEND_VISIBLE_DEVICES=3 \
--device=/dev/davinci3 \
--device=/dev/davinci_manager \
--device=/dev/devmm_svm \
--device=/dev/hisi_hdc \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \
-v /root/YOYOFile/work:/root/work \
ascendhub.huawei.com/public-ascendhub/ascend-infer-arm:21.0.1 \
/bin/bash

参数说明

  • –ipc=host,容器共享宿主机的内存。
  • –name,容器的名称。
  • -p,端口映射,宿主机的 6012 端口和映射到docker容器22端口。
  • -e,使用ASCEND_VISIBLE_DEVICES环境变量指定被挂载至容器中的NPU设备。
  • –device,NPU设备和管理设备(如/dev/davinci0、/dev/davinci_manager、/dev/hisi_hdc、/dev/devmm_svm)。
  • -v /usr/local/Ascend/driver 映射物理机的Ascend驱动到docker容器。
  • -v npu-smi 映射npu-smi指令到docker容器。
  • -v /work 工作目录,模型转换和推理要用的代码目录映射到docker容器。
  • tag为镜像版本号,如21.0.1。

注意:==宿主机的 6012 端口和docker容器抛出的 22 端口做了映射,访问宿主机的6012端口也即直接访问了docker容器中 22 端口。如果宿主机是一个云主机或者云服务器,只要它有公网ip,就可以通过 <kbd>宿主机ip:6012</kbd> 就可以访问docker容器==。

查看docker容器

==宿主机 0.0.0.0:6012 映射docker容器22端口==。

docker ps -a

查看镜像

3.4 容器中安装ssh

安装ssh

# centos
yum -y install openssh*

# ubuntu
apt install passwd openssl openssh-server openssh-client -y
root@6c2955252e11:/tmp# apt install passwd openssl openssh-server openssh-client -y
Reading package lists... Done
Building dependency tree
Reading state information... Done
openssl is already the newest version (1.1.1-1ubuntu2.1~18.04.17).
Suggested packages:
  keychain libpam-ssh monkeysphere ssh-askpass molly-guard rssh ufw
The following packages will be upgraded:
  openssh-client openssh-server openssh-sftp-server passwd
4 upgraded, 0 newly installed, 0 to remove and 14 not upgraded.
Need to get 1617 kB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 http://ports.ubuntu.com/ubuntu-ports bionic-updates/main arm64 passwd arm64 1:4.5-1ubuntu2.2 [762 kB]
Get:2 http://ports.ubuntu.com/ubuntu-ports bionic-updates/main arm64 openssh-sftp-server arm64 1:7.6p1-4ubuntu0.7 [38.9 kB]
Get:3 http://ports.ubuntu.com/ubuntu-ports bionic-updates/main arm64 openssh-server arm64 1:7.6p1-4ubuntu0.7 [291 kB]
Get:4 http://ports.ubuntu.com/ubuntu-ports bionic-updates/main arm64 openssh-client arm64 1:7.6p1-4ubuntu0.7 [525 kB]
Fetched 1617 kB in 9s (188 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
(Reading database ... 20171 files and directories currently installed.)
Preparing to unpack .../passwd_1%3a4.5-1ubuntu2.2_arm64.deb ...
Unpacking passwd (1:4.5-1ubuntu2.2) over (1:4.5-1ubuntu2) ...
Setting up passwd (1:4.5-1ubuntu2.2) ...
(Reading database ... 20171 files and directories currently installed.)
Preparing to unpack .../openssh-sftp-server_1%3a7.6p1-4ubuntu0.7_arm64.deb ...
Unpacking openssh-sftp-server (1:7.6p1-4ubuntu0.7) over (1:7.6p1-4ubuntu0.6) ...
Preparing to unpack .../openssh-server_1%3a7.6p1-4ubuntu0.7_arm64.deb ...
Unpacking openssh-server (1:7.6p1-4ubuntu0.7) over (1:7.6p1-4ubuntu0.6) ...
Preparing to unpack .../openssh-client_1%3a7.6p1-4ubuntu0.7_arm64.deb ...
Unpacking openssh-client (1:7.6p1-4ubuntu0.7) over (1:7.6p1-4ubuntu0.6) ...
Setting up openssh-client (1:7.6p1-4ubuntu0.7) ...
Setting up openssh-sftp-server (1:7.6p1-4ubuntu0.7) ...
Setting up openssh-server (1:7.6p1-4ubuntu0.7) ...
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (Can't locate Term/ReadLine.pm in @INC (you may need to install the Term::ReadLine module) (@INC contains: /etc/perl /usr/local/lib/aarch64-linux-gnu/perl/5.26.1 /usr/local/share/perl/5.26.1 /usr/lib/aarch64-linux-gnu/perl5/5.26 /usr/share/perl5 /usr/lib/aarch64-linux-gnu/perl/5.26 /usr/share/perl/5.26 /usr/local/lib/site_perl /usr/lib/aarch64-linux-gnu/perl-base) at /usr/share/perl5/Debconf/FrontEnd/Readline.pm line 7.)
debconf: falling back to frontend: Teletype
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of restart.
Processing triggers for systemd (237-3ubuntu10.53) ...

设置采用密码登录

vi /etc/ssh/sshd_config

#添加下面两行
PermitRootLogin yes
PasswordAuthentication  yes

==注意==:是sshd_config 而不是ssh_config,一定不能混淆。

设置密码

修改容器中root账户的密码。

passwd
root@6c2955252e11:/tmp# passwd
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully

重启sshd

service ssh start
root@6c2955252e11:/tmp# service ssh start
 * Starting OpenBSD Secure Shell server sshd

查看sshd运行状态

service ssh status
root@6c2955252e11:/tmp# service ssh status
 * sshd is running

3.5 (可选)宿主机添加端口放行

==如果服务器中已开启防火墙,需要添加端口放行。如果未开启,该步骤可忽略。==

iptables -A INPUT -p tcp --dport 6012 -j ACCEPT

# 保存配置
service iptables save

# 重启iptables
service iptables restart
systemctl restart iptables

# 查看iptables是否生效
service iptables status
systemctl status iptables
[root@10 ~]# service iptables stop
Redirecting to /bin/systemctl stop iptables.service

3.6 远程SSH连接容器

ssh root@ip -p 6012
# port端口是宿主机(服务器)的port,而不是docker容器暴露的port,千万不要搞混了

SSH登录docker

3.7 测试atc模型转换

下载模型文件

# 下载
wget https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/turing/resourcecenter/model/Resnet50V1/zh/1.1/ATC_Resnet50_V1_from_Tensorflow_Ascend310.zip

# 解压
unzip ATC_Resnet50_V1_from_Tensorflow_Ascend310.zip

设置环境变量

source /usr/local/Ascend/nnrt/set_env.sh
或者
source /usr/local/Ascend/ascend-toolkit/set_env.sh

atc转换模型

atc --model=./resnet_v1_50.pb --framework=3 --output=./resnet50v1_framework_tensorflow_aipp_0_batch_1_input_fp16_output_FP32 --output_type=FP32 --soc_version=Ascend310 --input_shape=input:1,224,224,3

模型转换成功

root@a87e10f43473:/tmp# atc --model=./resnet_v1_50.pb --framework=3 --output=./resnet50v1_framework_tensorflow_aipp_0_batch_1_input_fp16_output_FP32 --output_type=FP32 --soc_version=Ascend310 --input_shape=input:1,224,224,3
ATC start working now, please wait for a moment.
ATC run success, welcome to the next use.

3.8 常用指令

# 查看宿主机的port被docker容器监听
netstat -tunlp

# 在物理机看容器端口映射
docker port container-name 22

# 进入容器
docker exec -it <container_id> /bin/bash

# 查看容器的网络环境
docker inspect <container_id>

# 查看所有容器
docker ps -a

# 查看所有镜像
docker images

# 保存容器为镜像
docker commit  84641e8d0f74  centos_6.6_ssh    #容器ID  创建的镜像名

# 宿主机拷贝文件到docker容器
docker cp xxx.zip <container_id>:/tmp 

四、FAQ

Q:宿主机与docker镜像中的CANN环境不兼容

root@2a9571160a25:~/work# atc --model=./resnet_v1_50.pb --framework=3 --output=./resnet50v1_framework_tensorflow_aipp_0_batch_1_input_fp16_output_FP32 --output_type=FP32 --soc_version=Ascend310 --input_shape=input:1,224,224,3
DrvMngGetConsoleLogLevel failed. (g_conLogLevel=3)
ATC start working now, please wait for a moment.
DrvMngGetConsoleLogLevel failed. (g_conLogLevel=3)
DrvMngGetConsoleLogLevel failed. (g_conLogLevel=3)
DrvMngGetConsoleLogLevel failed. (g_conLogLevel=3)
DrvMngGetConsoleLogLevel failed. (g_conLogLevel=3)
DrvMngGetConsoleLogLevel failed. (g_conLogLevel=3)
DrvMngGetConsoleLogLevel failed. (g_conLogLevel=3)
DrvMngGetConsoleLogLevel failed. (g_conLogLevel=3)
DrvMngGetConsoleLogLevel failed. (g_conLogLevel=3)
ATC run success, welcome to the next use.
错误原因:
宿主机的driver驱动映射到docker容器中,docker镜像自带nnrt包,nnrt包中的CANN版本与drive驱动不兼容
宿主机与docker镜像中的CANN环境不兼容

解决办法:
1. 启动Docker容器,修改配置
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver
改为
-v /usr/local/Ascend:/usr/local/Ascend

2. 修改环境变量
source /usr/local/Ascend/nnrt/set_env.sh
改为
source /usr/local/Ascend/ascend-toolkit/set_env.sh

Q:Failed to start firewalld - dynamic firewall daemon

最全解决方案:Failed to start firewalld – dynamic firewall daemon

Failed to start firewalld - dynamic firewall daemon.
错误原因:
1. Python
2. firewalld进程的问题。

方法一:
修改以下源码
/usr/sbin/firewalld
/usr/bin/firewall-cmd

#!/usr/bin/python -Es
改为
#!/usr/bin/python2

方法二:
如果方法一失效,请尝试方法二
systemctl stop firewalld
pkill -f firewalld
systemctl start firewalld
systemctl status firewalld

Q:找不到/etc/sysconfig/iptables

Centos7防火墙放行指定IP和端口

The service command supports only basic LSB actions (start, stop, restart, try-restart, reload, force-reload, status). For other actions, please try to use systemctl.
解决办法:
1. 安装或更新
yum install iptables-services

2. 保存配置
service iptables save

3. 执行完毕之后/etc/syscofig/iptables文件就有了

Q:运行docker失败

docker: Error response from daemon: driver failed programming external connectivity on endpoint ascend-infer-arm-1 (1c508946fd80fe24a2bcb62f63fbd08e3a98f7834a42660c368cb3a0609fec80):  (iptables failed: iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 6018 -j DNAT --to-destination 172.17.0.3:22 ! -i docker0: iptables: No chain/target/match by that name.
 (exit status 1)).
ERRO[0000] error waiting for container: context canceled
解决办法:
重启docker
systemctl restart docker
【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

举报
请填写举报理由
0/200