我正在尝试通过发出以下命令来设置Kubernetes主机:


  • 后跟:Installing a pod network add-on (Calico)
  • 后跟: Master Isolation


  • 问题:coredns pod 具有 CrashLoopBackOffError 状态:
    # kubectl get pods -n kube-system
    NAME                                       READY   STATUS             RESTARTS   AGE
    calico-node-lflwx                          2/2     Running            0          2d
    coredns-576cbf47c7-nm7gc                   0/1     CrashLoopBackOff   69         2d
    coredns-576cbf47c7-nwcnx                   0/1     CrashLoopBackOff   69         2d
    etcd-suey.nknwn.local                      1/1     Running            0          2d
    kube-apiserver-suey.nknwn.local            1/1     Running            0          2d
    kube-controller-manager-suey.nknwn.local   1/1     Running            0          2d
    kube-proxy-xkgdr                           1/1     Running            0          2d
    kube-scheduler-suey.nknwn.local            1/1     Running            0          2d
    #
    

    我尝试使用 Troubleshooting kubeadm - Kubernetes ,但是我的节点没有运行 SELinux 并且我的 Docker 是最新的。
    # docker --version
    Docker version 18.06.1-ce, build e68fc7a
    #
    
    kubectldescribe :
    # kubectl -n kube-system describe pod coredns-576cbf47c7-nwcnx
    Name:               coredns-576cbf47c7-nwcnx
    Namespace:          kube-system
    Priority:           0
    PriorityClassName:  <none>
    Node:               suey.nknwn.local/192.168.86.81
    Start Time:         Sun, 28 Oct 2018 22:39:46 -0400
    Labels:             k8s-app=kube-dns
                        pod-template-hash=576cbf47c7
    Annotations:        cni.projectcalico.org/podIP: 192.168.0.30/32
    Status:             Running
    IP:                 192.168.0.30
    Controlled By:      ReplicaSet/coredns-576cbf47c7
    Containers:
      coredns:
        Container ID:  docker://ec65b8f40c38987961e9ed099dfa2e8bb35699a7f370a2cda0e0d522a0b05e79
        Image:         k8s.gcr.io/coredns:1.2.2
        Image ID:      docker-pullable://k8s.gcr.io/coredns@sha256:3e2be1cec87aca0b74b7668bbe8c02964a95a402e45ceb51b2252629d608d03a
        Ports:         53/UDP, 53/TCP, 9153/TCP
        Host Ports:    0/UDP, 0/TCP, 0/TCP
        Args:
          -conf
          /etc/coredns/Corefile
        State:          Running
          Started:      Wed, 31 Oct 2018 23:28:58 -0400
        Last State:     Terminated
          Reason:       Error
          Exit Code:    137
          Started:      Wed, 31 Oct 2018 23:21:35 -0400
          Finished:     Wed, 31 Oct 2018 23:23:54 -0400
        Ready:          True
        Restart Count:  103
        Limits:
          memory:  170Mi
        Requests:
          cpu:        100m
          memory:     70Mi
        Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
        Environment:  <none>
        Mounts:
          /etc/coredns from config-volume (ro)
          /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-xvq8b (ro)
    Conditions:
      Type              Status
      Initialized       True
      Ready             True
      ContainersReady   True
      PodScheduled      True
    Volumes:
      config-volume:
        Type:      ConfigMap (a volume populated by a ConfigMap)
        Name:      coredns
        Optional:  false
      coredns-token-xvq8b:
        Type:        Secret (a volume populated by a Secret)
        SecretName:  coredns-token-xvq8b
        Optional:    false
    QoS Class:       Burstable
    Node-Selectors:  <none>
    Tolerations:     CriticalAddonsOnly
                     node-role.kubernetes.io/master:NoSchedule
                     node.kubernetes.io/not-ready:NoExecute for 300s
                     node.kubernetes.io/unreachable:NoExecute for 300s
    Events:
      Type     Reason     Age                     From                       Message
      ----     ------     ----                    ----                       -------
      Normal   Killing    54m (x10 over 4h19m)    kubelet, suey.nknwn.local  Killing container with id docker://coredns:Container failed liveness probe.. Container will be killed and recreated.
      Warning  Unhealthy  9m56s (x92 over 4h20m)  kubelet, suey.nknwn.local  Liveness probe failed: HTTP probe failed with statuscode: 503
      Warning  BackOff    5m4s (x173 over 4h10m)  kubelet, suey.nknwn.local  Back-off restarting failed container
    # kubectl -n kube-system describe pod coredns-576cbf47c7-nm7gc
    Name:               coredns-576cbf47c7-nm7gc
    Namespace:          kube-system
    Priority:           0
    PriorityClassName:  <none>
    Node:               suey.nknwn.local/192.168.86.81
    Start Time:         Sun, 28 Oct 2018 22:39:46 -0400
    Labels:             k8s-app=kube-dns
                        pod-template-hash=576cbf47c7
    Annotations:        cni.projectcalico.org/podIP: 192.168.0.31/32
    Status:             Running
    IP:                 192.168.0.31
    Controlled By:      ReplicaSet/coredns-576cbf47c7
    Containers:
      coredns:
        Container ID:  docker://0f2db8d89a4c439763e7293698d6a027a109bf556b806d232093300952a84359
        Image:         k8s.gcr.io/coredns:1.2.2
        Image ID:      docker-pullable://k8s.gcr.io/coredns@sha256:3e2be1cec87aca0b74b7668bbe8c02964a95a402e45ceb51b2252629d608d03a
        Ports:         53/UDP, 53/TCP, 9153/TCP
        Host Ports:    0/UDP, 0/TCP, 0/TCP
        Args:
          -conf
          /etc/coredns/Corefile
        State:          Running
          Started:      Wed, 31 Oct 2018 23:29:11 -0400
        Last State:     Terminated
          Reason:       Error
          Exit Code:    137
          Started:      Wed, 31 Oct 2018 23:21:58 -0400
          Finished:     Wed, 31 Oct 2018 23:24:08 -0400
        Ready:          True
        Restart Count:  102
        Limits:
          memory:  170Mi
        Requests:
          cpu:        100m
          memory:     70Mi
        Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
        Environment:  <none>
        Mounts:
          /etc/coredns from config-volume (ro)
          /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-xvq8b (ro)
    Conditions:
      Type              Status
      Initialized       True
      Ready             True
      ContainersReady   True
      PodScheduled      True
    Volumes:
      config-volume:
        Type:      ConfigMap (a volume populated by a ConfigMap)
        Name:      coredns
        Optional:  false
      coredns-token-xvq8b:
        Type:        Secret (a volume populated by a Secret)
        SecretName:  coredns-token-xvq8b
        Optional:    false
    QoS Class:       Burstable
    Node-Selectors:  <none>
    Tolerations:     CriticalAddonsOnly
                     node-role.kubernetes.io/master:NoSchedule
                     node.kubernetes.io/not-ready:NoExecute for 300s
                     node.kubernetes.io/unreachable:NoExecute for 300s
    Events:
      Type     Reason     Age                     From                       Message
      ----     ------     ----                    ----                       -------
      Normal   Killing    44m (x12 over 4h18m)    kubelet, suey.nknwn.local  Killing container with id docker://coredns:Container failed liveness probe.. Container will be killed and recreated.
      Warning  BackOff    4m58s (x170 over 4h9m)  kubelet, suey.nknwn.local  Back-off restarting failed container
      Warning  Unhealthy  8s (x102 over 4h19m)    kubelet, suey.nknwn.local  Liveness probe failed: HTTP probe failed with statuscode: 503
    #
    
    kubectllog :
    # kubectl -n kube-system logs -f coredns-576cbf47c7-nm7gc
    E1101 03:31:58.974836       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    E1101 03:31:58.974836       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    E1101 03:31:58.974857       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    E1101 03:32:29.975493       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    E1101 03:32:29.976732       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    E1101 03:32:29.977788       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    E1101 03:33:00.976164       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    E1101 03:33:00.977415       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    E1101 03:33:00.978332       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    2018/11/01 03:33:08 [INFO] SIGTERM: Shutting down servers then terminating
    E1101 03:33:31.976864       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    E1101 03:33:31.978080       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    E1101 03:33:31.979156       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    #
    
    # kubectl -n kube-system log -f coredns-576cbf47c7-gqdgd
    .:53
    2018/11/05 04:04:13 [INFO] CoreDNS-1.2.2
    2018/11/05 04:04:13 [INFO] linux/amd64, go1.11, eb51e8b
    CoreDNS-1.2.2
    linux/amd64, go1.11, eb51e8b
    2018/11/05 04:04:13 [INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769
    2018/11/05 04:04:19 [FATAL] plugin/loop: Seen "HINFO IN 3597544515206064936.6415437575707023337." more than twice, loop detected
    # kubectl -n kube-system log -f coredns-576cbf47c7-hhmws
    .:53
    2018/11/05 04:04:18 [INFO] CoreDNS-1.2.2
    2018/11/05 04:04:18 [INFO] linux/amd64, go1.11, eb51e8b
    CoreDNS-1.2.2
    linux/amd64, go1.11, eb51e8b
    2018/11/05 04:04:18 [INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769
    2018/11/05 04:04:24 [FATAL] plugin/loop: Seen "HINFO IN 6900627972087569316.7905576541070882081." more than twice, loop detected
    #
    
    describe ( apiserver ):
    # kubectl -n kube-system describe pod kube-apiserver-suey.nknwn.local
    Name:               kube-apiserver-suey.nknwn.local
    Namespace:          kube-system
    Priority:           2000000000
    PriorityClassName:  system-cluster-critical
    Node:               suey.nknwn.local/192.168.87.20
    Start Time:         Fri, 02 Nov 2018 00:28:44 -0400
    Labels:             component=kube-apiserver
                        tier=control-plane
    Annotations:        kubernetes.io/config.hash: 2433a531afe72165364aace3b746ea4c
                        kubernetes.io/config.mirror: 2433a531afe72165364aace3b746ea4c
                        kubernetes.io/config.seen: 2018-11-02T00:28:43.795663261-04:00
                        kubernetes.io/config.source: file
                        scheduler.alpha.kubernetes.io/critical-pod:
    Status:             Running
    IP:                 192.168.87.20
    Containers:
      kube-apiserver:
        Container ID:  docker://659456385a1a859f078d36f4d1b91db9143d228b3bc5b3947a09460a39ce41fc
        Image:         k8s.gcr.io/kube-apiserver:v1.12.2
        Image ID:      docker-pullable://k8s.gcr.io/kube-apiserver@sha256:094929baf3a7681945d83a7654b3248e586b20506e28526121f50eb359cee44f
        Port:          <none>
        Host Port:     <none>
        Command:
          kube-apiserver
          --authorization-mode=Node,RBAC
          --advertise-address=192.168.87.20
          --allow-privileged=true
          --client-ca-file=/etc/kubernetes/pki/ca.crt
          --enable-admission-plugins=NodeRestriction
          --enable-bootstrap-token-auth=true
          --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
          --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
          --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
          --etcd-servers=https://127.0.0.1:2379
          --insecure-port=0
          --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
          --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
          --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
          --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
          --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
          --requestheader-allowed-names=front-proxy-client
          --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
          --requestheader-extra-headers-prefix=X-Remote-Extra-
          --requestheader-group-headers=X-Remote-Group
          --requestheader-username-headers=X-Remote-User
          --secure-port=6443
          --service-account-key-file=/etc/kubernetes/pki/sa.pub
          --service-cluster-ip-range=10.96.0.0/12
          --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
          --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
        State:          Running
          Started:      Sun, 04 Nov 2018 22:57:27 -0500
        Last State:     Terminated
          Reason:       Completed
          Exit Code:    0
          Started:      Sun, 04 Nov 2018 20:12:06 -0500
          Finished:     Sun, 04 Nov 2018 22:55:24 -0500
        Ready:          True
        Restart Count:  2
        Requests:
          cpu:        250m
        Liveness:     http-get https://192.168.87.20:6443/healthz delay=15s timeout=15s period=10s #success=1 #failure=8
        Environment:  <none>
        Mounts:
          /etc/ca-certificates from etc-ca-certificates (ro)
          /etc/kubernetes/pki from k8s-certs (ro)
          /etc/ssl/certs from ca-certs (ro)
          /usr/local/share/ca-certificates from usr-local-share-ca-certificates (ro)
          /usr/share/ca-certificates from usr-share-ca-certificates (ro)
    Conditions:
      Type              Status
      Initialized       True
      Ready             True
      ContainersReady   True
      PodScheduled      True
    Volumes:
      etc-ca-certificates:
        Type:          HostPath (bare host directory volume)
        Path:          /etc/ca-certificates
        HostPathType:  DirectoryOrCreate
      k8s-certs:
        Type:          HostPath (bare host directory volume)
        Path:          /etc/kubernetes/pki
        HostPathType:  DirectoryOrCreate
      ca-certs:
        Type:          HostPath (bare host directory volume)
        Path:          /etc/ssl/certs
        HostPathType:  DirectoryOrCreate
      usr-share-ca-certificates:
        Type:          HostPath (bare host directory volume)
        Path:          /usr/share/ca-certificates
        HostPathType:  DirectoryOrCreate
      usr-local-share-ca-certificates:
        Type:          HostPath (bare host directory volume)
        Path:          /usr/local/share/ca-certificates
        HostPathType:  DirectoryOrCreate
    QoS Class:         Burstable
    Node-Selectors:    <none>
    Tolerations:       :NoExecute
    Events:            <none>
    #
    

    系统日志(主机):



    请指教。

    最佳答案

    这个错误

    [FATAL] plugin/loop: Seen "HINFO IN 6900627972087569316.7905576541070882081." more than twice, loop detected
    

    是当CoreDNS在解析配置中检测到循环时引起的,这是预期的行为。您遇到了以下问题:

    https://github.com/kubernetes/kubeadm/issues/1162

    https://github.com/coredns/coredns/issues/2087

    Hacky解决方案:禁用CoreDNS循环检测

    编辑CoreDNS配置图:
    kubectl -n kube-system edit configmap coredns
    

    loop删除或注释掉该行,然后保存并退出。

    然后删除CoreDNS Pod,以便可以使用新的配置创建新的Pod:
    kubectl -n kube-system delete pod -l k8s-app=kube-dns
    

    在那之后一切都会好起来的。

    首选解决方案:删除DNS配置中的循环

    首先,检查您是否正在使用systemd-resolved。如果您正在运行Ubuntu 18.04,则可能是这种情况。
    systemctl list-unit-files | grep enabled | grep systemd-resolved
    

    如果是,请检查您的集群使用哪个resolv.conf文件作为引用:
    ps auxww | grep kubelet
    

    您可能会看到类似以下的行:
    /usr/bin/kubelet ... --resolv-conf=/run/systemd/resolve/resolv.conf
    

    重要的部分是--resolv-conf-我们确定是否使用了systemd resolv.conf。

    如果它是resolv.confsystemd,请执行以下操作:

    检查/run/systemd/resolve/resolv.conf的内容,看是否有类似以下的记录:
    nameserver 127.0.0.1
    

    如果存在127.0.0.1,则它是导致循环的代码。

    要摆脱它,您不应该编辑该文件,而应检查其他位置以使其正确生成。

    检查/etc/systemd/network下的所有文件,以及是否找到类似的记录
    DNS=127.0.0.1
    

    删除该记录。还要检查/etc/systemd/resolved.conf并根据需要执行相同的操作。确保至少配置了一台或两台DNS服务器,例如
    DNS=1.1.1.1 1.0.0.1
    

    完成所有这些操作后,重新启动systemd服务以使您的更改生效:
    systemctl重新启动systemd-networked systemd-resolved

    之后,请确认DNS=127.0.0.1文件中不再存在resolv.conf:
    cat /run/systemd/resolve/resolv.conf
    

    最后,触发DNS Pod的重新创建
    kubectl -n kube-system delete pod -l k8s-app=kube-dns
    

    摘要:该解决方案涉及从主机DNS配置中删除看起来像DNS查找循环的内容。不同的resolv.conf管理器/实现之间的步骤各不相同。

    关于docker - coredns pod 具有 CrashLoopBackOff 或 Error 状态,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/53075796/

    10-13 07:20