续前节。切好继续:

一,文档里提到uio_pci_generic, igb_uio, vfio_pci三个内核模块,完全搞不懂,以及dpdk-devbind.py用来查看网卡状态,我得到了下边的输出:

[root@dpdk tools]# ./dpdk-devbind.py --status
Network devices using DPDK-compatible driver
============================================
<none>
Network devices using kernel driver
===================================
::03.0 'Virtio network device' if= drv=virtio-pci unused=
Other network devices
=====================
<none>
[root@dpdk tools]#

所以,首先需要学习一下qemu的网卡设置,调一调硬件再回来~~(我悲催的去man qemu了。。。)

此前,对于qemu的网络,我只有一种用法,外边一个tap,里边一个virtio。

man完,回来鸟,guest的硬件使用”-net nic model=xxx“可以模拟。但是如何passthough还不知道。

1 在前端驱动使用virtio的情况下,如何让后端使用vhost-user

突然意识到其实这个事情如此复杂,于是我觉得另起一文。move to  ” [qemu] 在前端驱动使用virtio的情况下,如何让后端使用vhost-user”

2. 设备直接访问,PCI passthrough

http://blog.csdn.net/qq123386926/article/details/47757089

http://blog.csdn.net/halcyonbaby/article/details/37776211

http://blog.csdn.net/richardysteven/article/details/9008971

两种方法,pci-stub / VFIO ,我只使用较新的VFIO。我准备把我的物理网口交给虚拟机直接访问。

1. 确保CPU支持 vt-d,并且bois中已经打开。

我的CPU是支持地:http://ark.intel.com/products/85214/Intel-Core-i7-5500U-Processor-4M-Cache-up-to-3_00-GHz

2. 修改grub在内核启动 intel_iommu=on(这里有个坑,请继续阅读后边另起一 ”“ 讲了这个坑)

[tong@T7 dpdk]$ zcat /proc/config.gz  |grep -i intel_iommu
CONFIG_INTEL_IOMMU=y
CONFIG_INTEL_IOMMU_SVM=y
# CONFIG_INTEL_IOMMU_DEFAULT_ON is not set
CONFIG_INTEL_IOMMU_FLOPPY_WA=y
[tong@T7 dpdk]$

3. 加载 vfio-pci 驱动至内核。

[tong@T7 dpdk]$ sudo modprobe vfio-pci
[tong@T7 dpdk]$ lsmod |grep vfio
vfio_pci
vfio_iommu_type1
vfio_virqfd vfio_pci
vfio vfio_iommu_type1,vfio_pci
irqbypass kvm,vfio_pci
[tong@T7 dpdk]$

4. 查看网卡信息

[root@T7 ::19.0]# lspci -vv -nn -d :15a3
:19.0 Ethernet controller []: Intel Corporation Ethernet Connection () I218-V [:15a3] (rev )
Subsystem: Lenovo Device [17aa:]
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ
Region : Memory at f2200000 (-bit, non-prefetchable) [size=128K]
Region : Memory at f223e000 (-bit, non-prefetchable) [size=4K]
Region : I/O ports at [size=]
Capabilities: [c8] Power Management version
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel= DScale= PME-
Capabilities: [d0] MSI: Enable- Count=/ Maskable- 64bit+
Address: Data:
Capabilities: [e0] PCI Advanced Features
AFCap: TP+ FLR+
AFCtrl: FLR-
AFStatus: TP-
Kernel modules: e1000e

5. bind / unbind

[root@T7 ::19.0]# echo "0000:00:19.0" > /sys/bus/pci/devices/\:\:19.0/driver/unbind 
[root@T7 0000:00:19.0]# echo "8086 15a3" > /sys/bus/pci/drivers/vfio-pci/new_id  

*** 问题来了,根据文档描述,已经发现些许不对,我并没有iommu_group, 那是神马鬼。。。***

[tong@T7 dpdk]$ ls /dev/vfio/
vfio
[tong@T7 dpdk]$ dmesg |grep vfio
[20355.407062] vfio-pci: probe of ::19.0 failed with error -
[20593.172116] vfio-pci: probe of ::19.0 failed with error -
[20684.750370] vfio-pci: probe of ::19.0 failed with error -
[tong@T7 dpdk]$

我如下启动,然后报错:

[tong@T7 dpdk]$ cat start.sh
sudo qemu-system-x86_64 -enable-kvm \
-m 2G -cpu Nehalem -smp cores=,threads=,sockets= \
-numa node,mem=1G,cpus=-,nodeid= \
-numa node,mem=1G,cpus=-,nodeid= \
-drive file=disk.img,if=virtio \
-net nic,model=virtio,macaddr='00:00:00:00:00:03' \
-device vfio-pci,host='0000:00:19.0' \
-net tap,ifname=tap0 &
[tong@T7 dpdk]$ ./start.sh
[tong@T7 dpdk]$ qemu-system-x86_64: -device vfio-pci,host=::19.0: vfio: error no iommu_group for device
qemu-system-x86_64: -device vfio-pci,host=::19.0: Device initialization failed

问题解答:

为了解答这个问题,我读了内核文档,以及又读了IBM的这篇特别好的文。终于理解了iommu group到底是什么,然而并没有找到答案。

https://www.kernel.org/doc/Documentation/vfio.txt

https://www.ibm.com/developerworks/community/blogs/5144904d-5d75-45ed-9d2b-cf1754ee936a/entry/vfio?lang=en

那么为什么没有iommu_group呢? 因为我愚蠢啊!并没有如(2)所说在grub上加入内核参数 intel_iommu=on 。为什么我没加呢? 因为我已经zcat /proc/config.gz里边写着是y就是启动了的意思。然后等我加好这个参数之后,再zcat /proc/config.gz。两次竟然是一样的。嗯,原来我根本就把这个文件的功能理解错了。我猜它只是代表内核编译时的选项状态。与运行状态根本就是无关的!

于是,改完参数,系统刚刚启动开的时候,是酱紫的,就代表生效了:

[tong@T7 ~]$ ll /sys/bus/pci/devices/\:\:19.0/ |grep io
lrwxrwxrwx root root Sep : iommu -> ../../virtual/iommu/dmar1
lrwxrwxrwx root root Sep : iommu_group -> ../../../kernel/iommu_groups/
[tong@T7 ~]$

然后出栈这个问题,回到 unbind / bind 继续,我要passthrough给虚拟机的是物理网卡 lan0 :

unbind前网络灯亮,状态信息:

[tong@T7 ~]$ lspci -vv -nn -s :19.0
:19.0 Ethernet controller []: Intel Corporation Ethernet Connection () I218-V [:15a3] (rev )
Subsystem: Lenovo Device [17aa:]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency:
Interrupt: pin A routed to IRQ
Region : Memory at f2200000 (-bit, non-prefetchable) [size=128K]
Region : Memory at f223e000 (-bit, non-prefetchable) [size=4K]
Region : I/O ports at [size=]
Capabilities: <access denied>
Kernel driver in use: e1000e
Kernel modules: e1000e [tong@T7 ~]$ sudo ip link show dev lan0
: lan0: <BROADCAST,MULTICAST> mtu qdisc noop state DOWN mode DEFAULT group default qlen
link/ether :7b:9d:5c:1e:9b brd ff:ff:ff:ff:ff:ff
[tong@T7 ~]$ ll /sys/bus/pci/devices/\:\:19.0/ |grep driver
lrwxrwxrwx root root Sep : driver -> ../../../bus/pci/drivers/e1000e
-rw-r--r-- root root Sep : driver_override
[tong@T7 ~]$

unbind:(I don't know why ? maybe someday someone could tell me, if you see code belowj.但这并不重要)

[tong@T7 ~]$ sudo echo ::19.0 > /sys/bus/pci/devices/\:\:19.0/driver/unbind
bash: /sys/bus/pci/devices/::19.0/driver/unbind: Permission denied
[tong@T7 ~]$ sudo su -
[root@T7 ~]# echo ::19.0 > /sys/bus/pci/devices/\:\:19.0/driver/unbind
[root@T7 ~]#

unbind成功后,各状态的对比如下: 网卡灯还是亮的

[root@T7 ~]# lspci -vv -nn -s :19.0
:19.0 Ethernet controller []: Intel Corporation Ethernet Connection () I218-V [:15a3] (rev )
Subsystem: Lenovo Device [17aa:]
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ
Region : Memory at f2200000 (-bit, non-prefetchable) [size=128K]
Region : Memory at f223e000 (-bit, non-prefetchable) [size=4K]
Region : I/O ports at [size=]
Capabilities: [c8] Power Management version
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel= DScale= PME-
Capabilities: [d0] MSI: Enable- Count=/ Maskable- 64bit+
Address: Data:
Capabilities: [e0] PCI Advanced Features
AFCap: TP+ FLR+
AFCtrl: FLR-
AFStatus: TP-
Kernel modules: e1000e [root@T7 ~]# ip link show dev lan0
Device "lan0" does not exist.
[root@T7 ~]# ip link show
: lo: <LOOPBACK,UP,LOWER_UP> mtu qdisc noqueue state UNKNOWN mode DEFAULT group default qlen
link/loopback ::::: brd :::::
: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu qdisc mq state UP mode DORMANT group default qlen
link/ether dc:::6c:b5:7e brd ff:ff:ff:ff:ff:ff
: internal-br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu qdisc noqueue state UNKNOWN mode DEFAULT group default qlen
link/ether :4a::a1:4f: brd ff:ff:ff:ff:ff:ff
[root@T7 ~]# ll /sys/bus/pci/devices/\:\:19.0/ |grep driver
-rw-r--r-- root root Sep : driver_override
[root@T7 ~]#

bind to vfio:

[root@T7 ~]# modprobe vfio_pci
[root@T7 ~]# lsmod |grep vfio
vfio_pci
vfio_iommu_type1
vfio_virqfd vfio_pci
vfio vfio_iommu_type1,vfio_pci
irqbypass kvm,vfio_pci
[root@T7 ~]# echo 15a3 > /sys/bus/pci/drivers/vfio-pci/new_id

bind成功后,各种状态:

[root@T7 ~]# ll /sys/bus/pci/devices/\:\:19.0/iommu_group/devices/
total
lrwxrwxrwx root root Sep : ::19.0 -> ../../../../devices/pci0000:/::19.0
[root@T7 ~]# ll /dev/vfio/
total
crw------- root root , Sep :
crw-rw-rw- root root , Sep : vfio
[root@T7 ~]# ll /sys/bus/pci/devices/\:\:19.0/iom*
lrwxrwxrwx root root Sep : /sys/bus/pci/devices/::19.0/iommu -> ../../virtual/iommu/dmar1
lrwxrwxrwx root root Sep : /sys/bus/pci/devices/::19.0/iommu_group -> ../../../kernel/iommu_groups/
[root@T7 ~]# dmesg |tail
... ...
[ 1027.806155] e1000e ::19.0 lan0: removed PHC
[ 1394.134555] VFIO - User Level meta-driver version: 0.3
[root@T7 ~]# lspci -vv -nn -s :19.0
:19.0 Ethernet controller []: Intel Corporation Ethernet Connection () I218-V [:15a3] (rev )
Subsystem: Lenovo Device [17aa:]
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ
Region : Memory at f2200000 (-bit, non-prefetchable) [disabled] [size=128K]
Region : Memory at f223e000 (-bit, non-prefetchable) [disabled] [size=4K]
Region : I/O ports at [disabled] [size=]
Capabilities: [c8] Power Management version
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D3 NoSoftRst- PME-Enable- DSel= DScale= PME-
Capabilities: [d0] MSI: Enable- Count=/ Maskable- 64bit+
Address: Data:
Capabilities: [e0] PCI Advanced Features
AFCap: TP+ FLR+
AFCtrl: FLR-
AFStatus: TP-
Kernel driver in use: vfio-pci
Kernel modules: e1000e [root@T7 ~]# ip link
: lo: <LOOPBACK,UP,LOWER_UP> mtu qdisc noqueue state UNKNOWN mode DEFAULT group default qlen
link/loopback ::::: brd :::::
: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu qdisc mq state UP mode DORMANT group default qlen
link/ether dc:::6c:b5:7e brd ff:ff:ff:ff:ff:ff
: internal-br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu qdisc noqueue state UNKNOWN mode DEFAULT group default qlen
link/ether :4a::a1:4f: brd ff:ff:ff:ff:ff:ff
[root@T7 ~]#

6. 启虚拟机测试,进去虚拟机查看,多了一个网卡,该网卡在虚拟机内可以收到交换机上的二层广播,可以dhcp到地址:

[root@dpdk ~]# lspci -nn
:00.0 Host bridge []: Intel Corporation 440FX - 82441FX PMC [Natoma] [:] (rev )
:01.0 ISA bridge []: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] [:]
:01.1 IDE interface []: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] [:]
:01.3 Bridge []: Intel Corporation 82371AB/EB/MB PIIX4 ACPI [:] (rev )
:02.0 VGA compatible controller []: Device [:] (rev )
:03.0 Ethernet controller []: Red Hat, Inc Virtio network device [1af4:]
:04.0 Ethernet controller []: Intel Corporation Ethernet Connection () I218-V [:15a3] (rev )
:05.0 SCSI storage controller []: Red Hat, Inc Virtio block device [1af4:]
[root@dpdk ~]# ip link
: lo: <LOOPBACK,UP,LOWER_UP> mtu qdisc noqueue state UNKNOWN mode DEFAULT
link/loopback ::::: brd :::::
: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu qdisc pfifo_fast state UP mode DEFAULT qlen
link/ether ::::: brd ff:ff:ff:ff:ff:ff
: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu qdisc pfifo_fast state UP mode DEFAULT qlen
link/ether :7b:9d:5c:1e:9b brd ff:ff:ff:ff:ff:ff
[root@dpdk ~]# tcpdump -i ens4 -nn -c
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens4, link-type EN10MB (Ethernet), capture size bytes
::32.969547 ARP, Request who-has 192.168.197.100 tell 192.168.197.101, length
::33.970617 ARP, Request who-has 192.168.197.100 tell 192.168.197.101, length

7. 是否可以复用??? 我打算再启动一个虚拟机看看。

[tong@T7 CentOS7]$ ./start.sh
[tong@T7 CentOS7]$ qemu-system-x86_64: -device vfio-pci,host=::19.0: vfio: error opening /dev/vfio/: Device or resource busy
qemu-system-x86_64: -device vfio-pci,host=::19.0: vfio: failed to get group
qemu-system-x86_64: -device vfio-pci,host=::19.0: Device initialization failed
^C
[tong@T7 CentOS7]$

答案是不能!

至此,pci网卡使用 vfio 配置passthrough完成!: )

05-08 15:21