问题描述
我使用kubeadm
设置了一个具有单个主节点和两个工作节点的Kubernetes集群,我试图弄清楚如何从节点故障中恢复.
I set up a Kubernetes cluster with a single master node and two worker nodes using kubeadm
, and I am trying to figure out how to recover from node failure.
当工作节点失败时,恢复非常简单:我从头开始创建一个新的工作节点,运行kubeadm join
,一切都很好.
When a worker node fails, recovery is straightforward: I create a new worker node from scratch, run kubeadm join
, and everything's fine.
但是,我无法弄清楚如何从主节点故障中恢复(不中断工作节点上运行的部署).我是否需要备份和还原原始证书,还是可以只运行kubeadm init
从头开始创建新的主证书?如何加入现有的工作节点?
However, I cannot figure out how to recover from master node failure (without interrupting the deployments running on the worker nodes). Do I need to backup and restore the original certificates or can I just run kubeadm init
to create a new master from scratch? How do I join the existing worker nodes?
推荐答案
我最终写了一个Kubernetes CronJob 备份etcd数据.如果您有兴趣:我写了一篇博客文章: https ://labs.consol.de/kubernetes/2018/05/25/kubeadm-backup.html
I ended up writing a Kubernetes CronJob backing up the etcd data. If you are interested: I wrote a blog post about it: https://labs.consol.de/kubernetes/2018/05/25/kubeadm-backup.html
除此之外,您可能还希望备份所有/etc/kubernetes/pki
,以避免必须更新机密(令牌)的问题.
In addition to that you may want to backup all of /etc/kubernetes/pki
to avoid issues with secrets (tokens) having to be renewed.
例如,kube-proxy使用密钥存储令牌,如果仅备份etcd证书,此令牌将无效.
For example, kube-proxy uses a secret to store a token and this token becomes invalid if only the etcd certificate is backed up.
这篇关于如何使用kubeadm从主服务器故障中恢复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!