使用 kubeadm 初始化 worker节点出现 not ready 故障

举报
zuozewei 发表于 2021/09/30 18:47:32 2021/09/30
【摘要】 遇到的问题 问题排查 问题解决 获取容器镜像 下载镜像 离线镜像 重新创建pod 遇到的问题work 节点执行 kubeadm join 命令后集群状态一直显示 not ready,如下的 k8s-node-4$ kubectl get nodesNAME STATUS ROLES AGE VERSIONk8s-jmeter-1...

遇到的问题

work 节点执行 kubeadm join 命令后集群状态一直显示 not ready,如下的 k8s-node-4

$ kubectl get nodes
NAME                     STATUS     ROLES    AGE    VERSION
k8s-jmeter-1.novalocal   Ready      <none>   17d    v1.18.5
k8s-jmeter-2.novalocal   Ready      <none>   17d    v1.18.5
k8s-jmeter-3.novalocal   Ready      <none>   17d    v1.18.5
k8s-master.novalocal     Ready      master   51d    v1.18.5
k8s-node-1.novalocal     Ready      <none>   51d    v1.18.5
k8s-node-2.novalocal     Ready      <none>   51d    v1.18.5
k8s-node-3.novalocal     Ready      <none>   51d    v1.18.5
k8s-node-4.novalocal     NotReady   <none>   160m   v1.18.5

问题排查

首先查看系统 pod 初始化情况:

$ kubectl get pod -n kube-system -o wide
NAME                                           READY   STATUS                  RESTARTS   AGE     IP               NODE                     NOMINATED NODE   READINESS GATES
calico-kube-controllers-5b8b769fcd-srkrb       1/1     Running                 0          3d19h   10.100.185.9     k8s-jmeter-2.novalocal   <none>           <none>
calico-node-5c8xj                              1/1     Running                 10         51d     172.16.106.227   k8s-node-1.novalocal     <none>           <none>
calico-node-9d7rt                              1/1     Running                 8          51d     172.16.106.203   k8s-node-3.novalocal     <none>           <none>
calico-node-crczj                              1/1     Running                 5          51d     172.16.106.226   k8s-node-2.novalocal     <none>           <none>
calico-node-g4hx4                              0/1     Init:ImagePullBackOff   0          99s     172.16.106.219   k8s-node-4.novalocal     <none>           <none>
calico-node-gpmsv                              1/1     Running                 5          17d     172.16.106.209   k8s-jmeter-1.novalocal   <none>           <none>
calico-node-pz7w5                              1/1     Running                 4          51d     172.16.106.200   k8s-master.novalocal     <none>           <none>
calico-node-r59bw                              1/1     Running                 3          17d     172.16.106.216   k8s-jmeter-2.novalocal   <none>           <none>
calico-node-xhjj8                              1/1     Running                 4          17d     172.16.106.210   k8s-jmeter-3.novalocal   <none>           <none>
coredns-66db54ff7f-2cxcp                       1/1     Running                 0          5d22h   10.100.167.140   k8s-node-1.novalocal     <none>           <none>
coredns-66db54ff7f-gptgt                       1/1     Running                 0          5d22h   10.100.41.31     k8s-master.novalocal     <none>           <none>
eip-nfs-nfs-storage-6fddcc8f9d-hqv7m           1/1     Running                 0          3d19h   10.100.185.4     k8s-jmeter-2.novalocal   <none>           <none>
etcd-k8s-master.novalocal                      1/1     Running                 0          5d21h   172.16.106.200   k8s-master.novalocal     <none>           <none>
kube-apiserver-k8s-master.novalocal            1/1     Running                 14         51d     172.16.106.200   k8s-master.novalocal     <none>           <none>
kube-controller-manager-k8s-master.novalocal   1/1     Running                 56         16d     172.16.106.200   k8s-master.novalocal     <none>           <none>
kube-proxy-5msrp                               1/1     Running                 1          9d      172.16.106.226   k8s-node-2.novalocal     <none>           <none>
kube-proxy-64pkw                               1/1     Running                 2          9d      172.16.106.210   k8s-jmeter-3.novalocal   <none>           <none>
kube-proxy-6j2fw                               1/1     Running                 1          9d      172.16.106.203   k8s-node-3.novalocal     <none>           <none>
kube-proxy-7cptn                               1/1     Running                 0          157m    172.16.106.219   k8s-node-4.novalocal     <none>           <none>
kube-proxy-fkt9p                               1/1     Running                 1          9d      172.16.106.227   k8s-node-1.novalocal     <none>           <none>
kube-proxy-fxvjb                               1/1     Running                 4          9d      172.16.106.209   k8s-jmeter-1.novalocal   <none>           <none>
kube-proxy-wnj2l                               1/1     Running                 2          9d      172.16.106.216   k8s-jmeter-2.novalocal   <none>           <none>
kube-proxy-wnzqg                               1/1     Running                 0          9d      172.16.106.200   k8s-master.novalocal     <none>           <none>
kube-scheduler-k8s-master.novalocal            1/1     Running                 48         16d     172.16.106.200   k8s-master.novalocal     <none>           <none>
kuboard-5cc4bcccd7-t8h8f                       1/1     Running                 0          21h     10.100.185.24    k8s-jmeter-2.novalocal   <none>           <none>
metrics-server-677dcb8b4d-jtpgd                1/1     Running                 0          3d20h   172.16.106.227   k8s-node-1.novalocal     <none>           <none>

通过结果我们可以看到 node-4 的 calico 组件未初始化成功,Pod 状态显示为 ImagePullBackoff

问题解决

获取容器镜像

通过命令获取 Pod 所使用的容器镜像:

$ kubectl get pods calico-node-7vrgx -n kube-system -o yaml | grep image:
            f:image: {}
            f:image: {}
            f:image: {}
            f:image: {}
    image: calico/node:v3.13.1
    image: calico/cni:v3.13.1
    image: calico/cni:v3.13.1
  - image: calico/pod2daemon-flexvol:v3.13.1
  - image: calico/node:v3.13.1
  - image: calico/cni:v3.13.1
  - image: calico/cni:v3.13.1
  - image: calico/pod2daemon-flexvol:v3.13.1

我们可以看到此 calico Pod 主要使用以下三个 image:

  • calico/node:v3.13.1
  • calico/cni:v3.13.1
  • calico/pod2daemon-flexvol:v3.13.1

下载镜像

找到 node-4 主机,在上面执行:

$ docker pull calico/node:v3.13.1
$ docker pull calico/cni:v3.13.1  
$ docker pull calico/pod2daemon-flexvol:v3.13.1

离线镜像

如果 docker pull 无法下载镜像,可以考虑从其他节点导出 calico 插件的镜像:

# 保存镜像到本地
$ docker save image_id -o xxxx.tar

# 拷贝镜像到 work 节点
$ scp xxxx.tar root@k8s-node-4:/root/

# 装载镜像
$ docker load -i xxxx.tar

# 给镜像打tag
$ docker tag image_id tag

重新创建pod

在 master 删除原有的 pod:

$ kubectl delete pod calico-node-g4hx4 -n kube-system

等一会重新查看 pod 状态:

$ kubectl get pod -n kube-system -o wide
NAME                                           READY   STATUS    RESTARTS   AGE     IP               NODE                     NOMINATED NODE   READINESS GATES
calico-kube-controllers-5b8b769fcd-srkrb       1/1     Running   0          3d19h   10.100.185.9     k8s-jmeter-2.novalocal   <none>           <none>
calico-node-5c7hn                              0/1     Running   0          8s      172.16.106.219   k8s-node-4.novalocal     <none>           <none>
calico-node-5c8xj                              1/1     Running   10         51d     172.16.106.227   k8s-node-1.novalocal     <none>           <none>
calico-node-9d7rt                              1/1     Running   8          51d     172.16.106.203   k8s-node-3.novalocal     <none>           <none>
calico-node-crczj                              1/1     Running   5          51d     172.16.106.226   k8s-node-2.novalocal     <none>           <none>
calico-node-gpmsv                              1/1     Running   5          17d     172.16.106.209   k8s-jmeter-1.novalocal   <none>           <none>
calico-node-pz7w5                              1/1     Running   4          51d     172.16.106.200   k8s-master.novalocal     <none>           <none>
calico-node-r59bw                              1/1     Running   3          17d     172.16.106.216   k8s-jmeter-2.novalocal   <none>           <none>
calico-node-xhjj8                              1/1     Running   4          17d     172.16.106.210   k8s-jmeter-3.novalocal   <none>           <none>
coredns-66db54ff7f-2cxcp                       1/1     Running   0          5d22h   10.100.167.140   k8s-node-1.novalocal     <none>           <none>
coredns-66db54ff7f-gptgt                       1/1     Running   0          5d22h   10.100.41.31     k8s-master.novalocal     <none>           <none>
eip-nfs-nfs-storage-6fddcc8f9d-hqv7m           1/1     Running   0          3d19h   10.100.185.4     k8s-jmeter-2.novalocal   <none>           <none>
etcd-k8s-master.novalocal                      1/1     Running   0          5d21h   172.16.106.200   k8s-master.novalocal     <none>           <none>
kube-apiserver-k8s-master.novalocal            1/1     Running   14         51d     172.16.106.200   k8s-master.novalocal     <none>           <none>
kube-controller-manager-k8s-master.novalocal   1/1     Running   56         16d     172.16.106.200   k8s-master.novalocal     <none>           <none>
kube-proxy-5msrp                               1/1     Running   1          9d      172.16.106.226   k8s-node-2.novalocal     <none>           <none>
kube-proxy-64pkw                               1/1     Running   2          9d      172.16.106.210   k8s-jmeter-3.novalocal   <none>           <none>
kube-proxy-6j2fw                               1/1     Running   1          9d      172.16.106.203   k8s-node-3.novalocal     <none>           <none>
kube-proxy-7cptn                               1/1     Running   0          160m    172.16.106.219   k8s-node-4.novalocal     <none>           <none>
kube-proxy-fkt9p                               1/1     Running   1          9d      172.16.106.227   k8s-node-1.novalocal     <none>           <none>
kube-proxy-fxvjb                               1/1     Running   4          9d      172.16.106.209   k8s-jmeter-1.novalocal   <none>           <none>
kube-proxy-wnj2l                               1/1     Running   2          9d      172.16.106.216   k8s-jmeter-2.novalocal   <none>           <none>
kube-proxy-wnzqg                               1/1     Running   0          9d      172.16.106.200   k8s-master.novalocal     <none>           <none>
kube-scheduler-k8s-master.novalocal            1/1     Running   48         16d     172.16.106.200   k8s-master.novalocal     <none>           <none>
kuboard-5cc4bcccd7-t8h8f                       1/1     Running   0          21h     10.100.185.24    k8s-jmeter-2.novalocal   <none>           <none>
metrics-server-677dcb8b4d-jtpgd                1/1     Running   0          3d20h   172.16.106.227   k8s-node-1.novalocal     <none>           <none>

我们看到所有的 Pod 都已经处于正常状态,这时候查看下 node 的状态:

$ kubectl get nodes
NAME                     STATUS   ROLES    AGE    VERSION
k8s-jmeter-1.novalocal   Ready    <none>   17d    v1.18.5
k8s-jmeter-2.novalocal   Ready    <none>   17d    v1.18.5
k8s-jmeter-3.novalocal   Ready    <none>   17d    v1.18.5
k8s-master.novalocal     Ready    master   51d    v1.18.5
k8s-node-1.novalocal     Ready    <none>   51d    v1.18.5
k8s-node-2.novalocal     Ready    <none>   51d    v1.18.5
k8s-node-3.novalocal     Ready    <none>   51d    v1.18.5
k8s-node-4.novalocal     Ready    <none>   161m   v1.18.5

以上,初始化的 node 状态恢复正常。

【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。