问题描述
突然之间,我无法部署一些以前可以部署的图像.我得到以下 pod 状态:
All of a sudden, I cannot deploy some images which could be deployed before. I got the following pod status:
[root@webdev2 origin]# oc get pods
NAME READY STATUS RESTARTS AGE
arix-3-yjq9w 0/1 ImagePullBackOff 0 10m
docker-registry-2-vqstm 1/1 Running 0 2d
router-1-kvjxq 1/1 Running 0 2d
应用程序无法启动.pod 没有尝试运行容器.从事件页面,我得到了 Back-off pull image "172.30.84.25:5000/default/arix@sha256:d326
.我已经验证我可以使用 docker pull
拉取带有标签的图像.
The application just won't start. The pod is not trying to run the container. From the Event page, I have got Back-off pulling image "172.30.84.25:5000/default/arix@sha256:d326
. I have verified that I can pull the image with the tag with docker pull
.
我还检查了最后一个容器的日志.由于某种原因它被关闭了.我认为 pod 至少应该尝试重新启动它.
I have also checked the log of the last container. It was closed for some reason. I think the pod should at least try to restart it.
我已经没有办法调试这些问题了.我还可以检查什么?
I have run out of ideas to debug the issues. What can I check more?
推荐答案
您可以使用'describe pod'语法
You can use the 'describe pod' syntax
对于 OpenShift 使用:
oc describe pod <pod-id>
对于普通 Kubernetes:
kubectl describe pod <pod-id>
检查输出的事件.就我而言,它显示 Back-off pull image unreachableserver/nginx:1.14.22222
Examine the events of the output.In my case it shows Back-off pulling image unreachableserver/nginx:1.14.22222
这种情况下镜像unreachableserver/nginx:1.14.22222
无法从网上拉取,因为没有Docker registry unreachableserver和镜像nginx:1.14.22222
不存在.
In this case the image unreachableserver/nginx:1.14.22222
can not be pulled from the Internet because there is no Docker registry unreachableserver and the image nginx:1.14.22222
does not exist.
注意:如果您没有看到任何感兴趣的事件并且 pod 已经处于 'ImagePullBackOff' 状态一段时间(似乎超过 60 分钟),您需要删除 pod 并查看来自新 Pod 的事件.
对于 OpenShift 使用:
oc delete pod <pod-id>
oc get pods
oc get pod <new-pod-id>
对于普通 Kubernetes:
kubectl delete pod <pod-id>
kubectl get pods
kubectl get pod <new-pod-id>
示例输出:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 32s default-scheduler Successfully assigned rk/nginx-deployment-6c879b5f64-2xrmt to aks-agentpool-x
Normal Pulling 17s (x2 over 30s) kubelet Pulling image "unreachableserver/nginx:1.14.22222"
Warning Failed 16s (x2 over 29s) kubelet Failed to pull image "unreachableserver/nginx:1.14.22222": rpc error: code = Unknown desc = Error response from daemon: pull access denied for unreachableserver/nginx, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Warning Failed 16s (x2 over 29s) kubelet Error: ErrImagePull
Normal BackOff 5s (x2 over 28s) kubelet Back-off pulling image "unreachableserver/nginx:1.14.22222"
Warning Failed 5s (x2 over 28s) kubelet Error: ImagePullBackOff
其他调试步骤
- 尝试在您的计算机上手动拉取 docker 镜像和标记
- 通过执行kubectl/oc get pods -o wide"来识别节点
- ssh 进入无法拉取 docker 镜像的节点(如果可以的话)
- 通过执行 ping 检查节点是否可以解析 docker 注册表的 DNS.
- 尝试在节点上手动拉取 docker 镜像
- 如果您使用的是私有注册表,请检查您的 secret 存在并且该秘密是正确的.你的秘密也应该在同一个命名空间中.谢谢 swenzel
- 某些注册管理机构设有限制 IP 地址访问的防火墙.防火墙可能会阻止拉取
- 某些 CI 创建具有临时 docker 机密的部署.所以这个秘密会在几天后过期(你要求生产失败......)
- try to pull the docker image and tag manually on your computer
- Identify the node by doing a 'kubectl/oc get pods -o wide'
- ssh into the node (if you can) that can not pull the docker image
- check that the node can resolve the DNS of the docker registry by performing a ping.
- try to pull the docker image manually on the node
- If you are using a private registry, check that your secret exists and the secret is correct. Your secret should also be in the same namespace. Thanks swenzel
- Some registries have firewalls that limit ip address access. The firewall may block the pull
- Some CIs create deployments with temporary docker secrets. So the secret expires after a few days (You are asking for production failures...)
这篇关于如何调试“ImagePullBackOff"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!