参考文档:https://cooting.cn/archives/166.html
1.找到故障节点
先在ha上摘除故障节点
删除节点
kubectl get node |
删除故障的master节点
kubectl delete node d0-office-master003 |
登陆其他可用的master节点的etcd节点
kubectl -n kube-system exec -it etcd-d0-office-master001 sh |
定义别名alias
alias etcdctlold='etcdctl --endpoints=https://192.168.1.101:2379,https://192.168.1.102:2379,https://192.168.1.103:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key' |
查看集群状态获取etcd的id
etcdctlold endpoint status |
查看集群节点
etcdctlold member list |
删除废弃节点(此次我们删掉https://192.168.1.103:2379对应的节点)
etcdctlold member remove 75abeddc78aef692 |
------------------------------------------------------到这里master节点已删除完毕,为确保没有问题,我们再次查看
重新定义alias
alias etcdctlnew='etcdctl --endpoints=https://192.168.1.101:2379,https://192.168.1.102:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key' |
查看集群状态
etcdctlnew endpoint status |
查看集群节点
etcdctlnew member list |
注意:删除master期间不可修改集群组件和配置,不可重启服务(如果有hpa需要先停掉),不要让master节点长期处于偶数个