前几天,出现了无法跨主机ping通容器的情况,导致一个node机网络中断,无法访问,排查过程如下。

  1. 首先确认,宿主机node2是可以ping通容器

    [root@node2 ~]# ping 10.1.19.3
    PING 10.1.19.3 (10.1.19.3) 56(84) bytes of data.
    64 bytes from 10.1.19.3: icmp_seq=1 ttl=64 time=0.122 ms
    64 bytes from 10.1.19.3: icmp_seq=2 ttl=64 time=0.073 ms

      
    可以ping通,进行下一步

  2. 确认,代理机到容器是否可以ping通
    [root@node1 ~]# ping 10.1.19.3
    PING 10.1.19.3 (10.1.19.3) 56(84) bytes of data.
    ^C
    --- 10.1.19.3 ping statistics ---
    14 packets transmitted, 0 received, 100% packet loss, time 12999ms

      

    无法ping通,检查代理机

  3. 查看代理机的flannel子网段配置是否正常
    [root@node1 ~]# etcdctl ls /coreos.com/network/subnets
    /coreos.com/network/subnets/10.1.91.0-24
    /coreos.com/network/subnets/10.1.93.0-24
    /coreos.com/network/subnets/10.1.94.0-24
    /coreos.com/network/subnets/10.1.19.0-24
    /coreos.com/network/subnets/10.1.77.0-24

      

    网段配置是正常的,已经含有10.1.19.0-24 段了

  4. 返回去查看宿主机路由是否配置完整
    [root@node2 ~]# route -n
    Kernel IP routing table
    Destination Gateway Genmask Flags Metric Ref Use Iface
    0.0.0.0 192.168.19.51 0.0.0.0 UG 100 0 0 eth0
    10.1.19.0 0.0.0.0 255.255.255.0 U 0 0 0 docker0
    192.168.19.0 0.0.0.0 255.255.255.0 U 100 0 0 eth0

      

    配置不完整,缺少flannel路由配置

  5. 尝试重启flannel,如果无法自动创建路由,则进行手动添加
    [root@node2 ~]# route add -net 10.1.0.0 netmask  255.255.0.0 dev flannel0
    [root@node2 ~]# route -n
    Kernel IP routing table
    Destination Gateway Genmask Flags Metric Ref Use Iface
    0.0.0.0 192.168.19.51 0.0.0.0 UG 100 0 0 eth0
    10.1.0.0 0.0.0.0 255.255.0.0 U 0 0 0 flannel0
    10.1.19.0 0.0.0.0 255.255.255.0 U 0 0 0 docker0
    192.168.19.0 0.0.0.0 255.255.255.0 U 100 0 0 eth0

      

    创建完成之后没确认网络是否通了

  6. 确认网络,如果依然无法联通,由于flannel.1网卡和docker0网卡通过iptables的forward转发,所以确保:
    1. 核中的forward功能开启(立即生效,重启后效果不再)

      echo "1" > /proc/sys/net/ipv4/ip_forward
      

        

    2. 包不会被iptables的forward规则拦截

      sudo iptables -P FORWARD ACCEPT
      
  7. 确认网络是否联通了
    [root@node1 ~]# ping 10.1.19.3
    PING 10.1.19.3 (10.1.19.3) 56(84) bytes of data.
    64 bytes from 10.1.19.3: icmp_seq=1 ttl=61 time=0.444 ms
    64 bytes from 10.1.19.3: icmp_seq=2 ttl=61 time=0.288 ms
    ^C
    --- 10.1.19.3 ping statistics ---
    2 packets transmitted, 2 received, 0% packet loss, time 999ms
    rtt min/avg/max/mdev = 0.288/0.366/0.444/0.078 ms

      

    网络没有问题了

以上

05-20 12:16