文章目录
问题背景
kubelet、docker和containerd 的工作目录默认都在 /var/lib 下。
但是我们学校实验室租的线上机器挂载在 /
的磁盘空间很小,挂载在 /mnt/data_mnt/
的数据盘空间大。
应该是因为工作目录的原因,当 /
占用超过 80% 时, kubelet 会认为磁盘空间不足,因为 DiskPressure 而进入 NotReady 状态。
(以下是迁移后)
root@iZhp3hqett0mw795req5b2Z:~# df -h | head
Filesystem Size Used Avail Use% Mounted on
udev 16G 0 16G 0% /dev
tmpfs 16G 19M 16G 1% /run
/dev/vda1 99G 48G 46G 51% /
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/vdb1 493G 120G 348G 26% /mnt/data_mnt
overlay 99G 48G 46G 51% /var/lib/containers/storage/overlay/54a47bbff1442f521326770cab94eb3221d82b0ff9e997c1b2efe6cad811b21b/merged
overlay 99G 48G 46G 51% /var/lib/containers/storage/overlay/a74d553e701c85c5ad25fd14a8fd30383e0dc21f4b567bc81e6b7ac74bc73524/merged
迁移
Docker
停止 Docker 服务
删除所有容器后。
systemctl stop docker
修改配置
Docker配置文件在 /etc/docker/daemon.json
,增加字段设置数据目录。
参考官网文档 https://docs.docker.com/config/daemon/#daemon-data-directory
修改后示例:
{
"registry-mirrors": [
"https://dockerhub.azk8s.cn",
"https://hub-mirror.c.163.com",
"https://reg-mirror.qiniu.com"
],
"builder": {
"gc": {
"defaultKeepStorage": "20GB",
"enabled": true
}
},
"experimental": true,
"features": {
"buildkit": false
},
"dns": ["8.8.8.8", "8.8.4.4"],
"data-root": "/mnt/data_mnt/var/lib/docker"
}
移动文件
把 /var/lib/docker
复制到 /mnt/data_mnt/var/lib/docker
重新启动 Docker 服务
systemctl start docker
# 跑一个 nginx 看看
docker run -p 80:80 nginx
# 查看服务状态
systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2023-10-17 22:28:47 CST; 12h ago
Docs: https://docs.docker.com
Main PID: 3917580 (dockerd)
Tasks: 25
Memory: 1.0G
CPU: 1min 16.247s
CGroup: /system.slice/docker.service
├─ 370428 /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 5050 -container-ip 172.17.0.2 -container-port 5000
└─3917580 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
Oct 18 10:11:41 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:11:41.715286425+08:00" level=error msg="Handler for POST /v1.41/containers/f66c7e907176ccd2abe010253448ab6dcab286c60f893b4cde72184215747d90/start returned error: driver
Oct 18 10:17:18 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:17:18.451142888+08:00" level=info msg="Attempting next endpoint for push after error: Get \"https://localhost:5000/v2/\": http: server gave HTTP response to HTTPS client
Oct 18 10:17:18 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:17:18.455921606+08:00" level=error msg="Upload failed: no basic auth credentials"
Oct 18 10:17:18 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:17:18.455953643+08:00" level=error msg="Upload failed: no basic auth credentials"
Oct 18 10:17:18 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:17:18.455930600+08:00" level=error msg="Upload failed: no basic auth credentials"
Oct 18 10:17:18 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:17:18.456010183+08:00" level=error msg="Upload failed: no basic auth credentials"
Oct 18 10:17:18 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:17:18.456058582+08:00" level=info msg="Attempting next endpoint for push after error: no basic auth credentials"
Oct 18 10:18:56 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:18:56.354196507+08:00" level=info msg="Attempting next endpoint for push after error: Get \"https://localhost:5050/v2/\": http: server gave HTTP response to HTTPS client
Oct 18 10:19:02 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:19:02.439702060+08:00" level=info msg="Attempting next endpoint for push after error: Get \"https://localhost:5050/v2/\": http: server gave HTTP response to HTTPS client
Oct 18 10:19:07 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:19:07.267669420+08:00" level=info msg="Attempting next endpoint for push after error: Get \"https://localhost:5050/v2/\": http: server gave HTTP response to HTTPS client
containerd
停止服务
systemctl stop containerd
修改配置
配置文件在 /etc/containerd/config.toml
。
可以看到root = "/mnt/data_mnt/var/lib/containerd"
,可见工作目录默认在 /var/lib/containerd
万一不小心改乱了,可以重新生成默认配置:
containerd config default > /etc/containerd/config.toml
修改后例如:
version = 2
root = "/mnt/data_mnt/var/lib/containerd"
state = "/run/containerd"
oom_score = 0
[grpc]
address = "/run/containerd/containerd.sock"
uid = 0
gid = 0
max_recv_message_size = 16777216
max_send_message_size = 16777216
[debug]
address = "/run/containerd/containerd-debug.sock"
uid = 0
gid = 0
level = "warn"
[timeouts]
"io.containerd.timeout.shim.cleanup" = "5s"
"io.containerd.timeout.shim.load" = "5s"
"io.containerd.timeout.shim.shutdown" = "3s"
"io.containerd.timeout.task.state" = "2s"
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "sealos.hub:5000/pause:3.9"
max_container_log_line_size = -1
max_concurrent_downloads = 20
disable_apparmor = false
[plugins."io.containerd.grpc.v1.cri".containerd]
snapshotter = "overlayfs"
default_runtime_name = "runc"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
runtime_engine = ""
runtime_root = ""
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri".registry]
config_path = "/etc/containerd/certs.d"
[plugins."io.containerd.grpc.v1.cri".registry.configs]
[plugins."io.containerd.grpc.v1.cri".registry.configs."sealos.hub:5000".auth]
username = "admin"
password = "passw0rd"
移动文件
把 /mnt/data_mnt/var/lib/containerd
复制到 /var/lib/containerd
重新启动服务
systemctl start containerd
systemctl status containerd
kubelet(遇到问题待解决)
停止服务
systemctl stop kubelet
修改配置
kubelet 服务的配置,我的配置在 /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
。
注意同一目录可能还有个文件 /etc/systemd/system/kubelet.service.d/override.conf
实际运行中会用 override.conf 覆盖 10-kubeadm.conf 的内容。
修改后内容示例:
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/mnt/data_mnt/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/mnt/data_mnt/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
Environment="KUBELET_EXTRA_ARGS= \
\
\
--runtime-request-timeout=15m --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --image-service-endpoint=unix:///var/run/image-cri-shim.sock"
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
另外还要修改 /etc/kubernetes/kubelet.conf
中配置的密钥地址,修改后示例(部分)
# 以上省略
users:
- name: system:node:izhp3hqett0mw795req5b2z
user:
client-certificate: /mnt/data_mnt/var/lib/kubelet/pki/kubelet-client-current.pem
client-key: /mnt/data_mnt/var/lib/kubelet/pki/kubelet-client-current.pem
另外还要建软连接,因为读取密钥时,是通过名为“当前”的软连接找实际特定版本的密钥,移动后就乱套了。
ln -s kubelet-client-2023-10-07-11-14-02.pem kubelet-client-current.pem
移动文件(遇到问题待解决)
有些文件删除不了……
root@iZhp3hqett0mw795req5b2Z:~# rm -rf /var/lib/kubelet
rm: cannot remove '/var/lib/kubelet/pods/30c0099f-dfcc-4e6f-893e-eacc6ed44021/volumes/kubernetes.io~projected/kube-api-access-6jt8n': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/30c0099f-dfcc-4e6f-893e-eacc6ed44021/volumes/kubernetes.io~empty-dir/tmp-volume': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/54e7cb22-fdab-4e33-afb3-c8ba88d153a2/volumes/kubernetes.io~projected/kube-api-access-j84xs': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/d1a3fba3-3ab8-4ef9-b61c-6479b26c79f7/volumes/kubernetes.io~projected/kube-api-access-lf5tx': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/5e38f3a0-7f59-4d2e-98f4-1ec915e6ba89/volumes/kubernetes.io~projected/kube-api-access-prz4v': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/0f02517c-01c3-4b58-9f85-be169a92a31d/volumes/kubernetes.io~projected/kube-api-access-r4kxp': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/7098d438-0a9d-40df-aee1-ec4884ba262f/volumes/kubernetes.io~projected/kube-api-access-rqtwq': Device or resource busy
重新启动服务
systemctl start kubelet
systemctl status kubelet
使用的版本
日期:2023年10月18日
版本
root@iZhp3hqett0mw795req5b2Z:~# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:53:42Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:47:40Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
root@iZhp3hqett0mw795req5b2Z:~# docker version
Client:
Version: 20.10.21
API version: 1.41
Go version: go1.18.1
Git commit: 20.10.21-0ubuntu1~18.04.3
Built: Thu Apr 27 05:50:21 2023
OS/Arch: linux/amd64
Context: default
Experimental: true
Server:
Engine:
Version: 20.10.21
API version: 1.41 (minimum version 1.12)
Go version: go1.18.1
Git commit: 20.10.21-0ubuntu1~18.04.3
Built: Thu Apr 27 05:36:22 2023
OS/Arch: linux/amd64
Experimental: true
containerd:
Version: 1.6.12-0ubuntu1~18.04.1
GitCommit:
runc:
Version: 1.1.4-0ubuntu1~18.04.2
GitCommit:
docker-init:
Version: 0.19.0
GitCommit:
root@iZhp3hqett0mw795req5b2Z:~# containerd --version
containerd github.com/containerd/containerd 1.6.12-0ubuntu1~18.04.1