etcd快照备份

举报
leepongmin 发表于 2024/05/07 17:12:47 2024/05/07
【摘要】 kubernetes 集群备份etcd数据

1.1.健康检查

# step1:查看集群健康状态
$ ETCDCTL_API=3 etcdctl --cacert=/opt/kubernetes/ssl/ca.pem --cert=/opt/kubernetes/ssl/server.pem --key=/opt/kubernetes/ssl/server-key.pem --endpoints=https://10.10.50.31:2379,https://10.10.50.32:2379,https://10.10.50.33:2379 endpoint health

https://10.10.50.31:2379 is healthy: successfully committed proposal: took = 1.698385ms
https://10.10.50.32:2379 is healthy: successfully committed proposal: took = 1.577913ms
https://10.10.50.33:2379 is healthy: successfully committed proposal: took = 5.616079ms

#获取某个 key 信息
$ ETCDCTL_API=3 etcdctl \
--cacert=/opt/kubernetes/ssl/ca.pem \
--cert=/opt/kubernetes/ssl/server.pem \
--key=/opt/kubernetes/ssl/server-key.pem \
--endpoints=https://192.168.1.36:2379,https://192.168.1.37:2379,https://192.168.1.38:2379 \
get /registry/apiregistration.k8s.io/apiservices/v1.apps

#获取 etcd 版本信息
$ ETCDCTL_API=3 etcdctl \
--cacert=/opt/kubernetes/ssl/ca.pem \
--cert=/opt/kubernetes/ssl/server.pem \
--key=/opt/kubernetes/ssl/server-key.pem \
--endpoints=https://10.10.50.31:2379,https://10.10.50.32:2379,https://10.10.50.33:2379 \
version

#获取 ETCD 所有的 key
$ ETCDCTL_API=3 etcdctl \
--cacert=/opt/kubernetes/ssl/ca.pem \
--cert=/opt/kubernetes/ssl/server.pem \
--key=/opt/kubernetes/ssl/server-key.pem \
--endpoints=https://10.10.50.31:2379,https://10.10.50.32:2379,https://10.10.50.33:2379 \
get / --prefix --keys-only

=======================================================================================
#检查健康状态
$ ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt \
--key=/etc/kubernetes/pki/etcd/peer.key --write-out=table \
--endpoints=127.0.0.1:2379,127.0.0.1:2379 endpoint health

#检查节点状态
$ ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt \
--key=/etc/kubernetes/pki/etcd/peer.key --write-out=table \
--endpoints=127.0.0.1:2379,127.0.0.1:2379,127.0.0.1:2379 member list

#查看leader
$ ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt \
--key=/etc/kubernetes/pki/etcd/peer.key --write-out=table \
--endpoints=127.0.0.1:2379,127.0.0.1:2379,127.0.0.1:2379 endpoint status

2.命令备份

  • 注意:ETCD 不同的版本的 etcdctl 命令不一样,但大致差不多,本文备份使用 napshot save , 每次备份一个节点就行。
$ ETCDCTL_API=3 etcdctl --cacert=/opt/kubernetes/ssl/ca.pem \
--cert=/opt/kubernetes/ssl/server.pem \
--key=/opt/kubernetes/ssl/server-key.pem \
--endpoints=https://10.10.50.31:2379 \
snapshot save /data/etcd_backup_dir/etcd-snapshot-`date +%Y%m%d`.db

3.脚本备份

#!/usr/bin/env bash

date;
CACERT="/opt/kubernetes/ssl/ca.pem"
CERT="/opt/kubernetes/ssl/server.pem"
EKY="/opt/kubernetes/ssl/server-key.pem"
ENDPOINTS="10.10.50.31:2379"

ETCDCTL_API=3 etcdctl \
--cacert="${CACERT}" --cert="${CERT}" --key="${EKY}" \
--endpoints=${ENDPOINTS} \
snapshot save /data/etcd_backup_dir/etcd-snapshot-`date +%Y%m%d`.db

# 备份保留30天
find /data/etcd_backup_dir/ -name *.db -mtime +30 -exec rm -f {} \;

4.恢复

#停止所有 Master 上 kube-apiserver 服务
$ systemctl stop kube-apiserver
$ ps -ef | grep kube-apiserver

#停止集群中所有 ETCD 服务
$ systemctl stop etcd

#移除所有 ETCD 存储目录下数据
$ mv /var/lib/etcd/default.etcd /var/lib/etcd/default.etcd.bak

#拷贝 ETCD 备份快照
# 从 k8s-master1 机器上拷贝备份
$ scp /data/etcd_backup_dir/etcd-snapshot-20231222.db root@k8s-master2:/data/etcd_backup_dir/
$ scp /data/etcd_backup_dir/etcd-snapshot-20231222.db root@k8s-master3:/data/etcd_backup_dir/
  • 恢复备份(需要在所有master主机上操作
# k8s-master1 机器上操作
$ ETCDCTL_API=3 etcdctl snapshot restore /data/etcd_backup_dir/etcd-snapshot-20231222.db \
  --name etcd-0 \
  --initial-cluster "etcd-0=https://10.10.50.31:2380,etcd-1=https://10.10.50.32:2380,etcd-2=https://10.10.50.33:2380" \
  --initial-cluster-token etcd-cluster \
  --initial-advertise-peer-urls https://10.10.50.31:2380 \
  --data-dir=/var/lib/etcd/default.etcd
#上面三台 ETCD 都恢复完成后,依次登陆三台机器启动 ETCD
$ systemctl start etcd

#三台 ETCD 启动完成,检查 ETCD 集群状态
$ ETCDCTL_API=3 etcdctl --cacert=/opt/kubernetes/ssl/ca.pem \
--cert=/opt/kubernetes/ssl/server.pem \
--key=/opt/kubernetes/ssl/server-key.pem \
--endpoints=https://10.10.50.31:2379,https://10.10.50.32:2379,https://10.10.50.33:2379 \
endpoint health

#三台 ETCD 全部健康,分别到每台 Master 启动 kube-apiserver
$ systemctl start kube-apiserver

#检查 Kubernetes 集群是否恢复正常
$ kubectl get cs

5.总结

  • Kubernetes 集群备份主要是备份 ETCD 集群。而恢复时,主要考虑恢复整个顺序:

  • 停止kube-apiserver --> 停止ETCD --> 恢复数据 --> 启动ETCD --> 启动kube-apiserve
    
  • 注意:备份ETCD集群时,只需要备份一个ETCD就行,恢复时,拿同一份备份数据恢复。

【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。