排查问题思路
一般出现这种状况都是网卡mac地址错误引起的!要么网卡配置文件中的mac地址不对,要么/etc/udev/rules.d/70-persistent-net.rules文件中的mac地址不对!!!
问题现象描述
- bond网卡地址ping不通;
- 交换机侧看对应端口状态如下(无关信息省略)
<CL202-R04F02-H3CS7610-SW01>display interface Ten-GigabitEthernet 1/2/0/4
Ten-GigabitEthernet1/2/0/4
Current state: UP
Line protocol state: UP
IP packet frame type: Ethernet II, hardware address: 7057-bf25-8a00
......
<CL202-R04F02-H3CS7610-SW01>display interface Ten-GigabitEthernet 2/2/0/4
Ten-GigabitEthernet2/2/0/4
Current state: UP
Line protocol state: DOWN(LAGG)
IP packet frame type: Ethernet II, hardware address: 7057-bf24-b800
......
- 在配置bond的两张网卡上执行
ifconfig eth2 up
和ifconfig eth3 up
都报类似的错:eth2: unknown interface: No such device;
故障分析定位
- 从故障现象描述第3条手动UP网卡的报错信息以及交换机侧看对应端口的信息,基本可以排除是交换机侧的故障和物理链路故障,主要排查服务器侧的故障;一般此问题是服务器网卡的MAC地址不对造成的。
故障排查过程
- 查看网卡
如下,我们可以看到系统中有4张网卡,eth0、eth1、eth2和eth3:
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]# ll ifcfg-*
-rw-r--r--. 1 root root 196 Mar 23 15:34 ifcfg-bond0
-rw-r--r-- 1 root root 328 Mar 23 21:02 ifcfg-eth0
-rw-r--r--. 1 root root 212 Mar 23 15:30 ifcfg-eth1
-rw-r--r-- 1 root root 117 May 7 16:58 ifcfg-eth2
-rw-r--r-- 1 root root 117 May 7 16:58 ifcfg-eth3
-rw-r--r--. 1 root root 254 Apr 27 2018 ifcfg-lo
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]#
- 查看
/etc/udev/rules.d/70-persistent-net.rules
文件内容如下
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]# more /etc/udev/rules.d/70-persistent-net.rules
# This file was automatically generated by the /lib/udev/write_net_rules
# program, run by the persistent-net-generator.rules rules file.
#
# You can modify it, as long as you keep each rule on a single
# line, and change only the value of the NAME= key.
# PCI device 0x8086:0x1521 (igb)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="50:af:73:2e:5c:37", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"
# PCI device 0x8086:0x1521 (igb)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="50:af:73:2e:5c:38", ATTR{type}=="1", KERNEL=="eth*", NAME="eth1"
# PCI device 0x8086:0x37d3 (i40e)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="6c:92:bf:c5:a8:28", ATTR{type}=="1", KERNEL=="eth*", NAME="eth2"
# PCI device 0x8086:0x37d3 (i40e)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="6c:92:bf:c5:a8:29", ATTR{type}=="1", KERNEL=="eth*", NAME="eth3"
# PCI device 0x8086:0x37d3 (i40e)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="6c:92:bf:a3:ac:49", ATTR{type}=="1", KERNEL=="eth*", NAME="eth4"
# PCI device 0x8086:0x37d3 (i40e)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="6c:92:bf:a3:ac:48", ATTR{type}=="1", KERNEL=="eth*", NAME="eth5"
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]#
- 发现的问题:在网卡配置文件目录下只有eth0、eth1、eth2和eth3这4张网卡,但是在/etc/udev/rules.d/70-persistent-net.rules文件中发现竟然多了eth4和eth5这2张网卡;并且查看eth2和eth3网卡配置文件时发现其mac地址和/etc/udev/rules.d/70-persistent-net.rules文件中显示的eth2和eth3文件的mac地址不一样;eth2和eth3配置文件内容如下:
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]# cat ifcfg-eth2
DEVICE="eth2"
#HWADDR="6c:92:bf:c5:a8:28"
ONBOOT=yes
BOOTPROTO=none
TYPE=Ethernet
NAME="eth2"
MASTER=bond0
SLAVE=yes
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]#
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]# cat ifcfg-eth3
DEVICE="eth3"
#HWADDR="6c:92:bf:c5:a8:29"
ONBOOT=yes
BOOTPROTO=none
TYPE=Ethernet
NAME="eth3"
MASTER=bond0
SLAVE=yes
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]#
可以从以上信息看出,eth2和eth3网卡配置文件中的mac地址和/etc/udev/rules.d/70-persistent-net.rules中eth2和eth3中的mac地址不一样;
远程登录IPMI查看主机mac地址信息如下图:
从上述信息可以判定配置文件中eth2和eth3的mac地址信息是错的
造成mac地址错误的原因
之前这台设备报修过,更换过网卡文件,所以网卡的mac地址变了;但是/etc/udev/rules.d/70-persistent-net.rules和网卡配置文件中eth2和eth3的mac地址没有对应更新,而是异常新增了并不存在的eth4和eth5网卡,而实际的bond配置还是使用的eth2和eth3网卡,所以网络异常,手动UP网卡报错unknown interface: No such device。
解决办法
修改网卡配置文件和/etc/udev/rules.d/70-persistent-net.rules,修改后正确配置如下:
- /etc/udev/rules.d/70-persistent-net.rules
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]# cat /etc/udev/rules.d/70-persistent-net.rules
# This file was automatically generated by the /lib/udev/write_net_rules
# program, run by the persistent-net-generator.rules rules file.
#
# You can modify it, as long as you keep each rule on a single
# line, and change only the value of the NAME= key.
# PCI device 0x8086:0x1521 (igb)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="50:af:73:2e:5c:37", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"
# PCI device 0x8086:0x1521 (igb)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="50:af:73:2e:5c:38", ATTR{type}=="1", KERNEL=="eth*", NAME="eth1"
# PCI device 0x8086:0x37d3 (i40e)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="6c:92:bf:a3:ac:48", ATTR{type}=="1", KERNEL=="eth*", NAME="eth2"
# PCI device 0x8086:0x37d3 (i40e)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="6c:92:bf:a3:ac:49", ATTR{type}=="1", KERNEL=="eth*", NAME="eth3"
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]#
- ifcfg-eth2
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]# cat ifcfg-eth2
DEVICE="eth2"
#HWADDR="6c:92:bf:a3:ac:48"
ONBOOT=yes
BOOTPROTO=none
TYPE=Ethernet
NAME="eth2"
MASTER=bond0
SLAVE=yes
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]#
- ifcfg-eth3
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]# cat ifcfg-eth3
DEVICE="eth3"
#HWADDR="6c:92:bf:a3:ac:49"
ONBOOT=yes
BOOTPROTO=none
TYPE=Ethernet
NAME="eth3"
MASTER=bond0
SLAVE=yes
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]#
关键最后一步 --- 重启主机
修改配置文件后,尝试过重启网卡,但是依旧未成功,所以尝试了重启主机后世界豁然开朗,网络马上ojbk。
注:没修改mac地址之前重启网卡也是无效的。