本文介绍了从同一个套接字读取 tcp 和 udp 数据包的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在路由器中读取数据包,就像在 python 中一样:

I am trying to read packets in a router, like this in python:

# (skipping the exception handling code here)
s = socket.socket(socket.AF_PACKET, socket.SOCK_RAW, socket.ntohs(0x0003))
while True:
    p = s.recvfrom(2000)
    pkt = p[0]
    # process pkt here ...

对相关问题的回答 (36115971) 说 UDP 与 TCP 数据的参数和方法不同(有人说 recv用于 TCP,而 recvfrom 用于 UDP,而其他人则相反,类似地,有人说 1024 作为 TCP 的缓冲区大小,而 UDP 的缓冲区大小更大,而有些人则相反).在我在路由器中读取的情况下,我没有用于 TCP 和 UDP 的不同套接字,因此我需要从同一个套接字读取两者,因此我对如何读取传入数据包感到有些困惑.

Answers to a related question (36115971) say that parameters and methods for UDP vs TCP data are different (some say recv is for TCP and recvfrom is for UDP, and others say the opposite, similarly some say 1024 as buffer size for TCP and larger for UDP, and again some say the reverse). In my case of reading in a router, I do not have different sockets for TCP and UDP, so I need to read both from the same socket, so I am bit confused regarding how I should read the incoming packets.

(1) 如果我想同时读取 TCP UDP 数据包,我应该使用 recv() 还是 recvfrom()?

(1) Should I use recv() or recvfrom(), if I want to read both TCP and UDP packets?

(2) 调用是一次返回一个数据包,还是在缓冲区填满后返回?例如,如果我有一个 4096 字节的大缓冲区,并且传入的流 2 个数据包每个有 2400 个字节,调用会在第一个数据包结束后立即返回,还是在从第二个数据包填充缓冲区后返回?

(2) Do the calls return data one packet at a time, or do they return after the buffer is filled up? eg, if I have a large buffer of 4096 bytes, and the incoming streaming 2 packets have 2400 bytes each, will the call return as soon as the 1st packet ends, or will it return after filling up the buffer from the 2nd packet also?

(2a) 同样的问题,但如果我有一个 2000 字节的较小缓冲区.很明显,在第一次调用时,我将获得第一个数据包的前 2000 个字节.但是在下一次调用时,我会得到第一个数据包的最后 400 个字节,还是第二个数据包的前 2000 个字节?

(2a) same question, but if I have a smaller buffer of 2000 bytes. It is clear that on the 1st call I will get the first 2000 bytes of the 1st packet. But on the next call, will I get the last 400 bytes of the 1st packet, or the first 2000 bytes of the 2nd packet?

(3) 如果我延迟下一次调用,可能是因为我正忙于处理第一个数据集,我是否有丢失数据的危险,或者操作系统是否会保留传入数据包的内部队列以提供给下次我打电话给我的时候?如果操作系统保留其内部队列,我在哪里可以找到有关其大小的信息?

(3) If I am delayed in making the next call, maybe because I was busy processing the 1st dataset, am I in danger of losing data, or will the OS keep its internal queue of the incoming packets to be given to me when I call the next time? If the OS keeps its internal queue, where can I find information about its size?

注意: 一些给定的答复存在分歧,所以让我对我的问题设置一些界限.希望这些限制有助于给出更具体的答案.

NOTE: Some of the given replies have been divergent, so let me put in some boundaries to my question. Hopefully these restrictions will help to give more specific answers.

(a) 我的目标是使用仅python套接字来嗅探传入的数据包.所以其他涉及 tcpdump 或 tshark 等的解决方案不在范围内.

(a) My objective is to sniff the incoming packets with python sockets only. So other solutions involving tcpdump or tshark etc are outside the scope.

(b) 目标是只嗅探传入的数据包.其他细节,如数据包重新排序(对于 TCP 等面向连接的协议)不在范围内,实际上它们是可以避免的开销.

(b) The objective is to only sniff for incoming packets. Additional details like packet reordering (for connection oriented protocols like TCP) are outside the scope, actually they are avoidable overhead.

推荐答案

如果您从原始套接字读取数据包(如您的源代码所示),那么您可以轻松地从同一个套接字读取所有数据包.确保这是你打算做的.原始套接字用于进行数据包检查以进行故障排除、取证、安全或教育目的.您无法通过这种方式轻松地与另一个系统进行通信.

If you're reading packets from a raw socket (as shown in your source code), then you can easily read all packets from the same socket. Be sure this is what you intend to do. A raw socket is for doing packet inspection for troubleshooting, forensic, security or educational purposes. You cannot easily communicate with another system this way.

同样,这里的接收调用不会因协议而异,因为您实际上没有使用 TCP 或 UDP,您只是接收这些协议构建和解码的原始数据包.

And likewise, the receive calls will not differ here by protocol because you are not actually using TCP or UDP, you're simply receiving the raw packets that those protocols build and decode.

(1) 如果我想同时读取 TCP 和 UDP 数据包,我应该使用 recv() 还是 recvfrom()?

任一个都行.recv() 将只返回实际的数据包数据,而 recvfrom 将返回数据以及关于数据包的元数据,包括数据来自的接口从 packet(7) 手册页收到(以及在 struct sockaddr_ll 中定义的其他内容).

Either one will work. recv() will return to you only the actual packet data, while recvfrom will return to you the data along with metadata about the packet, including the interface from which the data was received (and other things defined in struct sockaddr_ll from the packet(7) man page).

(2) 调用是一次返回一个数据包,还是在缓冲区填满后返回?例如,如果我有一个 4096 字节的大缓冲区,并且传入的流 2 个数据包每个有 2400 个字节,调用会在第一个数据包结束后立即返回,还是在从第二个数据包填充缓冲区后返回?

当使用这样的原始套接字时,一次只能得到一个数据包.你永远不会得到超过一个.如果您提供的缓冲区不够大,则数据包将被截断(丢弃结束字节).

When using a raw socket like this, you get exactly one packet at a time. You will never get more than one. If the buffer you give is not large enough, then the packet will be truncated (with the ending bytes discarded).

(2a) 同样的问题,但如果我有一个 2000 字节的较小缓冲区.很明显,在第一次调用时,我将获得第一个数据包的前 2000 个字节.但是在下一次调用时,我会得到第一个数据包的最后 400 个字节,还是第二个数据包的前 2000 个字节?

一般来说,大多数网络上的数据包限制在 1514 字节左右.这是因为在网络接口上配置的传统MTU"(最大传输单元)为 1500 字节,并且通常包含两个 MAC 地址(每个 6 字节)的以太网标头加上一个两字节的以太网类型.在交换机或路由器中,您可能还会看到具有额外 4 字节标头的数据包,其中包含 VLAN 标头 (IEEE 802.1Q).(但是,一些网络内部出于特定目的使用最大约 9K 的巨型"数据包.)

Generally speaking, packets on most networks are limited to about 1514 bytes. This is because the traditional "MTU" (Maximum Transfer Unit) that is configured on the network interface is 1500 bytes and usually an Ethernet header containing two MAC addresses (6 bytes each) plus a two-byte Ethertype is prepended to that. In a switch or router, you may also see packets that have an additional 4-byte header containing a VLAN header (IEEE 802.1Q). (But, some networks internally use "jumbo" packets up to about 9K in size for specific purposes.)

您还应该了解,在编写应用程序时,可以发送大于最大数据包大小的 UDP 数据报(或 TCP 缓冲区).在这种情况下,操作系统会将它们分解成更小的块以进行发送(并且在传递给应用程序之前,它们会在目标端重新组装).当您接收这样的原始数据包时,您会看到数据包处于低级别(可能是碎片状态)状态.

You should also understand that, in writing an application, one can send UDP datagrams (or TCP buffers) larger than the maximum packet size. In that case, the OS breaks those up into smaller chunks for sending (and they are re-assembled on the destination side before being handed to an application). When you're receiving raw packets like this, you will see the packets in their low-level, possibly fragmented, state.

(3) 如果我延迟下一次调用,可能是因为我正忙于处理第一个数据集,我是否有丢失数据的危险,或者操作系统是否会保留传入数据包的内部队列以提供给下次我打电话给我的时候?如果操作系统保留其内部队列,我在哪里可以找到有关其大小的信息?

操作系统将为您保留一个数据包队列.大小当然是有限的,因为您无法跟上全线速下的 1Gb NIC(更不用说 10Gb 或更高的 NIC)了.大小以特定于系统的方式配置.在 linux —— 可能还有其他基于 Unix 的系统 —— 你可以用 SOL_SOCKET/SO_RCVBUF 调用 getsockopt 来了解可用的队列空间.

The OS will keep a queue of packets for you. The size is of course limited since there is no way you would be able to keep up with, say, a 1Gb NIC at full line rate (let alone a 10Gb or higher NIC). The size is configured in a system-specific way. On linux -- and probably other Unix-based systems -- you can call getsockopt with SOL_SOCKET / SO_RCVBUF to get an idea of the queue space available.

至少在 linux 上,可以使用 setsockopt 将大小设置为系统强加的最大值(它本身可以使用各种 sysctl 设置进行配置).

On linux, at least, the size can be set with setsockopt up to a system-imposed maximum (which itself can be configured with various sysctl settings).

这篇关于从同一个套接字读取 tcp 和 udp 数据包的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-05 13:34