k8s节点node not ready的一件事件-云社区-华为云

测试环境k8s集群的节点经常自己down掉又恢复，影响到上面运行的pod对应的系统。事件如下：

从报错Image garbage collection failed: non-existent label "docker-images"看出来是没有存在的docker images标签之类的。

于是检查kubelet和docker的状态和日志，指令如下：

systemctl status docker -l

systemctl status kubelet -l

[root@node-108 ~]# systemctl status kubelet -l
● kubelet.service - Kubernetes Kubelet Server
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
   Active: active (running) since 三 2023-03-15 16:29:34 CST; 30min ago

[root@node-108 ~]# systemctl status docker -l
● docker.service - Docker Application Container Engine
   Loaded: loaded (/etc/systemd/system/docker.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/docker.service.d
           └─docker-dns.conf, docker-options.conf, docker-orphan-cleanup.conf
   Active: active (running) since 三 2023-03-15 16:29:34 CST; 30min ago

看出来kubelet和 docker都重启过，根据Image garbage collection failed: non-existent label "docker-images"的报错，猜测是因为docker重启了，所以kubectl报错找不到docker images label。

想确定docker重启频率，于是写了定时任务：

[root@node-108 ~]# crontab -l
*/1 * * * * echo `date` `systemctl status docker |grep Active`  >> /home/dockerstatus.log

从/home/dockerstatus.log看到 16:21到16:24中间，docker重启了，而且定时任务没执行。

2023年 03月 15日 星期三 16:21:01 CST Active: active (running) since 三 2023-03-15 16:02:18 CST; 18min ago
2023年 03月 15日 星期三 16:24:01 CST Active: activating (start) since 三 2023-03-15 16:23:21 CST; 40s ago
2023年 03月 15日 星期三 16:25:01 CST Active: deactivating (stop-sigterm) (Result: timeout)
2023年 03月 15日 星期三 16:26:01 CST Active: activating (start) since 三 2023-03-15 16:25:51 CST; 9s ago
2023年 03月 15日 星期三 16:27:01 CST Active: deactivating (stop-sigterm) (Result: timeout)
2023年 03月 15日 星期三 16:28:01 CST Active: deactivating (stop-sigterm) (Result: timeout)
2023年 03月 15日 星期三 16:29:01 CST Active: activating (start) since 三 2023-03-15 16:28:22 CST; 39s ago
2023年 03月 15日 星期三 16:30:01 CST Active: active (running) since 三 2023-03-15 16:29:34 CST; 27s ago
2023年 03月 15日 星期三 16:31:01 CST Active: active (running) since 三 2023-03-15 16:29:34 CST; 1min 27s ago
2023年 03月 15日 星期三 16:32:01 CST Active: active (running) since 三 2023-03-15 16:29:34 CST; 2min 27s ago
2023年 03月 15日 星期三 16:33:01 CST Active: active (running) since 三 2023-03-15 16:29:34 CST; 3min 27s ago
2023年 03月 15日 星期三 16:34:01 CST Active: active (running) since 三 2023-03-15 16:29:34 CST; 4min 27s ago

于是用who -b看系统上次启动的时间，发现主机在16:22自己重启了。。。

[root@node-108 ~]# who -b
         系统引导 2023-03-15 16:22

可能是主机不稳定吧，只能把节点设成不可调度了。

[root@master-101 ~]# kubectl cordon node-108
node/node-108 cordoned

k8s节点node not ready的一件事件

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

k8s节点node not ready的一件事件

全部回复

设置昵称

关于作者

目录

热门推荐查看更多

相关文章

加入云驻计划，成为创作者

相关产品