问题描述
这个问题在所有类型的论坛上已经被问过很多次了,但是不幸的是,到目前为止,没有一个答案对我有帮助。
This question has been asked many many times before on all types of fora but unfortunately, none of the answers have helped me so far.
我会正确地
OS: RHEL 7.7 Maipo
Docker Version: Engline/Client 18.09.7
My configuration:
Host 1: IP 169.192.215.74
Host 2: IP 10.210.87.16
**On Host1:**
# netstat -plntu|grep -E "4789|7946|2377"
tcp6 0 0 :::2377 :::* LISTEN 3576/dockerd
tcp6 0 0 :::7946 :::* LISTEN 3576/dockerd
udp 0 0 0.0.0.0:4789 0.0.0.0:* -
udp6 0 0 :::7946 :::* 3576/dockerd
# docker swarm init --advertise-addr=169.192.215.74
# docker network create --attachable --driver overlay syndichain
# docker swarm join-token manager
docker swarm join --token SWMTKN-1-2rxp0qlhl8o5n1hlmf0umcj8jdub19t07ndbu7mlodeb1yi4uf-ceopcr39eaaiz6bfqd4kk6ojs 169.192.215.74:2377
Ping Host 2:
# ping 10.210.87.16
PING 10.210.87.16 (10.210.87.16) 56(84) bytes of data.
64 bytes from 10.210.87.16: icmp_seq=1 ttl=119 time=0.804 ms
64 bytes from 10.210.87.16: icmp_seq=2 ttl=119 time=1.49 ms
64 bytes from 10.210.87.16: icmp_seq=3 ttl=119 time=0.646 ms
64 bytes from 10.210.87.16: icmp_seq=4 ttl=119 time=0.664 ms
^C
--- 10.210.87.16 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 0.646/0.901/1.493/0.348 ms
# ip address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:8d:a0:60 brd ff:ff:ff:ff:ff:ff
inet 169.192.215.74/24 brd 169.192.215.255 scope global noprefixroute ens192
valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:a9:99:bf:c2 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
76: docker_gwbridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ab:17:1e:2f brd ff:ff:ff:ff:ff:ff
inet 172.19.0.1/16 brd 172.19.255.255 scope global docker_gwbridge
valid_lft forever preferred_lft forever
468: veth49ae88e@if467: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default
link/ether 82:e1:f6:1e:72:ec brd ff:ff:ff:ff:ff:ff link-netnsid 1
Bring up two containers (RHEL 7 container from our local repository)
# docker run --rm -it --network="syndichain" -p 11002:11002 --name rhel.da85 957c8834c8a9 bash
# docker run --rm -it --network="syndichain" -p 11003:11003 --name rhel.da85.2 957c8834c8a9 bash
在主机2上:
# docker swarm join --token SWMTKN-1-2rxp0qlhl8o5n1hlmf0umcj8jdub19t07ndbu7mlodeb1yi4uf-ceopcr39eaaiz6bfqd4kk6ojs 169.192.215.74:2377 --advertise-addr=10.210.87.216
# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
yxphc861xra6ajc5llxrpp9ia * sd-cfd1-0319 Ready Active Reachable 18.09.7
2c45co3pj38u2gwqrvuawwi11 sd-d5d8-da85 Ready Active Leader 18.09.7
# netstat -plntu|grep -E "4789|7946|2377"
tcp6 0 0 :::2377 :::* LISTEN 8562/dockerd
tcp6 0 0 :::7946 :::* LISTEN 8562/dockerd
udp 0 0 0.0.0.0:4789 0.0.0.0:* -
udp6 0 0 :::7946 :::* 8562/dockerd
Ping Host 1:
# ping 169.192.215.74
PING 169.192.215.74 (169.192.215.74) 56(84) bytes of data.
64 bytes from 169.192.215.74: icmp_seq=1 ttl=55 time=0.796 ms
64 bytes from 169.192.215.74: icmp_seq=2 ttl=55 time=0.530 ms
64 bytes from 169.192.215.74: icmp_seq=3 ttl=55 time=0.513 ms
^C
--- 169.192.215.74 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.513/0.613/0.796/0.129 ms
# ip address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:98:63:cb brd ff:ff:ff:ff:ff:ff
inet 10.210.87.216/23 brd 10.210.87.255 scope global noprefixroute ens192
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:20:4a:de:3f brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
871: docker_gwbridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:f0:d9:46:4c brd ff:ff:ff:ff:ff:ff
inet 172.23.0.1/16 brd 172.23.255.255 scope global docker_gwbridge
valid_lft forever preferred_lft forever
885: veth4d635e9@if884: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default
link/ether 86:4f:5b:81:89:5c brd ff:ff:ff:ff:ff:ff link-netnsid 1
892: veth2734e0c@if891: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default
link/ether b2:6c:1d:94:99:3f brd ff:ff:ff:ff:ff:ff link-netnsid 4
Bring up two RHEL7 containers on Host 2:
# docker run --rm -it --network="syndichain" --name rhel.0319 -p 11001:11001 957c8834c8a9 bash
# docker run --rm -it --network="syndichain" -p 11002:11002 --name rhel.0319.2 957c8834c8a9 bash
现在,我们运行一些命令来检查覆盖网络:
On Host 1:
# docker network inspect syndichain
[
{
"Name": "syndichain",
"Id": "8ar2ar0z0brl5lp1lk13xo4ik",
"Created": "2020-02-28T11:44:01.411177608-05:00",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.0.0.0/24",
"Gateway": "10.0.0.1"
}
]
},
"Internal": false,
"Attachable": true,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"fb911a7538d05f116d7f3ed50192e198d93faadaf1fb98f55fa79eb4cfbea72d": {
"Name": "rhel.da85",
"EndpointID": "8caf50c904f660c750bfa3608902d78239211e7076afed68d77a46e3e141386d",
"MacAddress": "02:42:0a:00:00:02",
"IPv4Address": "10.0.0.2/24",
"IPv6Address": ""
},
"lb-syndichain": {
"Name": "syndichain-endpoint",
"EndpointID": "0a122599aee9d9b84dde6383aeb432580769af886278b46310a1c65bc68a0bc9",
"MacAddress": "02:42:0a:00:00:03",
"IPv4Address": "10.0.0.3/24",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4097"
},
"Labels": {},
"Peers": [
{
"Name": "7f037ddf9a5f",
"IP": "10.210.87.216"
},
{
"Name": "b1b424f73d55",
"IP": "169.192.215.74"
}
]
}
]
On Host 2:
# docker network inspect syndichain
[
{
"Name": "syndichain",
"Id": "8ar2ar0z0brl5lp1lk13xo4ik",
"Created": "2020-02-28T11:42:08.565999966-05:00",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.0.0.0/24",
"Gateway": "10.0.0.1"
}
]
},
"Internal": false,
"Attachable": true,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"351d29af85e6e8addafcf69b3c464b49f8c9720cbaf8911328ab1aadce13e170": {
"Name": "rhel.0319.2",
"EndpointID": "fe0ed40c7a0b8687ca4a34d7b0196e382c32f3a5f82c43233174da349a8a5b34",
"MacAddress": "02:42:0a:00:00:04",
"IPv4Address": "10.0.0.4/24",
"IPv6Address": ""
},
"62493b5072d9c2e516a0b8bfb9d0ba6dea9c2cbca2787bf4f8c2bf5ffe021dd0": {
"Name": "rhel.0319",
"EndpointID": "9a5e8977ca434854da886c215f14583cc5a7c5bb8811d5408a5cae9c4a325c47",
"MacAddress": "02:42:0a:00:00:0e",
"IPv4Address": "10.0.0.14/24",
"IPv6Address": ""
},
"lb-syndichain": {
"Name": "syndichain-endpoint",
"EndpointID": "37415feb73f86cc3ceb1ab4d060da076ed4fca1f0b3ebe440993c82c14258a2d",
"MacAddress": "02:42:0a:00:00:0f",
"IPv4Address": "10.0.0.15/24",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4097"
},
"Labels": {},
"Peers": [
{
"Name": "7f037ddf9a5f",
"IP": "10.210.87.216"
},
{
"Name": "b1b424f73d55",
"IP": "169.192.215.74"
}
]
}
]
现在,我将演示问题:
Ping rhel.da85.2 container from rhel.da85 container (both containers on the same host)
[root@fb911a7538d0 /]# ping rhel.da85.2
PING rhel.da85.2 (10.0.0.5) 56(84) bytes of data.
64 bytes from rhel.da85.2.syndichain (10.0.0.5): icmp_seq=1 ttl=64 time=0.080 ms
64 bytes from rhel.da85.2.syndichain (10.0.0.5): icmp_seq=2 ttl=64 time=0.074 ms
64 bytes from rhel.da85.2.syndichain (10.0.0.5): icmp_seq=3 ttl=64 time=0.048 ms
^C
--- rhel.da85.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.048/0.067/0.080/0.015 ms
Ping rhel.0319 container from rhel.da85 container (containers on different hosts)
[root@fb911a7538d0 /]# ping rhel.0319
PING rhel.0319 (10.0.0.14) 56(84) bytes of data.
It is stuck at this. Nothing happens.
Ping rhel.0319.2 container from rhel.0319 container (both containers on the same host)
[root@62493b5072d9 /]# ping rhel.0319.2
PING rhel.0319.2 (10.0.0.4) 56(84) bytes of data.
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=1 ttl=64 time=0.109 ms
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=2 ttl=64 time=0.061 ms
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=3 ttl=64 time=0.064 ms
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=4 ttl=64 time=0.068 ms
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=5 ttl=64 time=0.064 ms
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=6 ttl=64 time=0.063 ms
^C
--- rhel.0319.2 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5000ms
rtt min/avg/max/mdev = 0.061/0.071/0.109/0.018 ms
Ping rhel.da85 container from rhel.0319 container (both containers on different hosts)
[root@62493b5072d9 /]# ping rhel.da85
PING rhel.da85 (10.0.0.2) 56(84) bytes of data.
It is stuck at this point, just like it is stuck pinging a container on 0319 from da85.
我甚至根据
但是,过去仍然会遇到此问题周。感谢任何帮助。
However, still stuck with this issue for the past week. Appreciate any help.
更新:
有人建议运行tcpdump和nsenter。这是在容器级别(在生成PING请求的同一VM上)运行的tcpdump日志,以及 ARP的nsenter日志。您会看到ARP请求从未在tcpdump中得到响应,但是nsenter显示了ARP的请求/答复对。
Someone suggested running tcpdump and nsenter. Here are the logs from the tcpdump running at the container level (on the same VM where the PING request is being generated) and nsenter logs for "ARP". You can see that the ARP request never gets a response in tcpdump but nsenter shows request/reply pairs for ARP.
[root@0ae70bfa66ae /]# tcpdump -v
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
16:26:03.276397 IP (tos 0x0, ttl 64, id 34495, offset 0, flags [DF], proto ICMP (1), length 84)
rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 1, length 64
16:26:04.276569 IP (tos 0x0, ttl 64, id 34971, offset 0, flags [DF], proto ICMP (1), length 84)
rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 2, length 64
16:26:05.276526 IP (tos 0x0, ttl 64, id 35636, offset 0, flags [DF], proto ICMP (1), length 84)
rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 3, length 64
16:26:06.276545 IP (tos 0x0, ttl 64, id 35935, offset 0, flags [DF], proto ICMP (1), length 84)
rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 4, length 64
16:26:07.276542 IP (tos 0x0, ttl 64, id 36706, offset 0, flags [DF], proto ICMP (1), length 84)
rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 5, length 64
16:26:08.276543 IP (tos 0x0, ttl 64, id 37284, offset 0, flags [DF], proto ICMP (1), length 84)
rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 6, length 64
16:26:08.294475 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has rhel.da85.syndichain tell rhel.0319.syndichain, length 28
# nsenter --net=/var/run/docker/netns/1-khyguu9i6n tcpdump -peni any "arp"
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
11:45:23.302460 P 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:45:23.302471 Out 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:45:23.302485 In 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
11:45:23.302488 Out 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
11:45:44.294458 P 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:45:44.294474 Out 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:45:44.294477 In 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
11:45:44.294479 Out 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
11:46:05.302460 P 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:46:05.302480 Out 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:46:05.302483 In 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
11:46:05.302485 Out 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
更新2:
From container on Source Host
[root@2baddaa98452 /]# nc -zvu rhel.da85 4789
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 10.0.0.2:4789.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.02 seconds.
Source Host TCP Dump
# tcpdump -i ens192 port 4789
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes
11:28:11.706587 IP sd-cfd1-0319.59121 > sd-d5d8-da85.4789: VXLAN, flags [I] (0x08), vni 4097
IP 10.0.0.11.44806 > 10.0.0.2.4789: [|VXLAN]
Target Host TCP Dump
# tcpdump -i ens192 port 4789
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes
推荐答案
从您的评论看来,覆盖网络端口在节点之间被阻塞。这可能来自主机上的iptables或其他防火墙工具,节点之间的网络防火墙,或其他软件(例如VM工具或云路由器ACL)阻止了这些端口。需要打开的端口是:
From your comment, it appears that the overlay networking ports are blocked between nodes. This could be from iptables or another firewall tool on the host, a network firewall between the nodes, or other software like VM tooling or a cloud router ACL, blocking those ports. The ports that need to be opened are:
参考:
这篇关于无法使用覆盖网络在另一台主机上ping Docker容器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!