本文介绍了Python结构错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试设计一个系统来对不同的二进制标志做出反应.

0 = 错误1 = 好的2 = 记录3 = 数字

此数据的序列表示引用作品、标志和编号的唯一 ID.一切正常,除了数字标志.这就是我得到的...

>>>导入结构>>>数据 = (1234, 3, 12345678)>>>bin = struct.pack('QHL', *data)>>>打印(垃圾箱)b'\xd2\x04\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00Na\xbc\x00\x00\x00\x00\x00'>>>结果 = struct.unpack_from('QH', bin, 0)>>>打印(结果)(1234, 3)>>>offset = struct.calcsize('QH')>>>结果 += struct.unpack_from('L', bin, offset)>>>打印(结果)(1234, 3, 7011541669862440960)

long 应该足够大来表示数字12345678,但是为什么它被错误地解包了?

当我尝试将它们分开打包时,看起来 struct 在标志和长之间添加了太多空字节.

>>>导入结构>>>struct.pack('QH', 1234, 3)b'\xd2\x04\x00\x00\x00\x00\x00\x00\x03\x00'>>>struct.pack('L', 12345678)b'Na\xbc\x00\x00\x00\x00\x00'

我可以通过在之前添加填充来重现此错误.

>>>struct.unpack('L', struct.pack('L', 12345678))(12345678,)>>>struct.unpack('xL', struct.pack('xL', 12345678))(12345678,)>>>struct.pack('xL', 12345678)b'\x00\x00\x00\x00\x00\x00\x00\x00Na\xbc\x00\x00\x00\x00\x00'

潜在的修复?

当我使用 little-endian 顺序时,问题似乎会自行纠正并使二进制字符串更短.由于这是针对 SSL 包装的 TCP 套接字的,所以这是双赢,对吧?保持低带宽通常是好的,是吗?

>>>导入结构>>>数据 = (1234, 3, 12345678)>>>bin = struct.pack('<QHL', *data)>>>打印(垃圾箱)b'\xd2\x04\x00\x00\x00\x00\x00\x00\x03\x00Na\xbc\x00'>>>结果 = struct.unpack_from('<QH', bin, 0)>>>打印(结果)(1234, 3)>>>offset = struct.calcsize('<QH')>>>result += struct.unpack_from('<L', bin, offset)>>>打印(结果)(1234, 3, 12345678)

为什么会这样?我很困惑.

解决方案

您遇到了字节对齐问题.您需要知道,默认情况下,结构的各个部分不仅彼此相邻放置,而且它们在内存中正确对齐.这使得它更高效,尤其是对于其他应用程序,因为它们可以更直接地访问单个字节,而无需考虑重叠.

使用struct.calcsizea> 查看使用某种格式进行编码所需的空间:

>>>struct.calcsize('QHL')16>>>struct.calcsize('QH')10

如您所见,QHL 需要 16 个字节,而 QH 需要 10 个字节.然而,我们留下的 L 只有 4 个字节宽.所以有一些填充要确保 L 在一个新块"上再次开始.这是因为任何类型都要求(带填充)它从是其自身大小的倍数的偏移量开始.对于 QH,它看起来像这样:

QQ QQ |QQ QQ |卫生局

使用QHL后,您将获得以下内容:

QQ QQ |QQ QQ |HH 00 |LL LL

如您所见,添加了两个填充字节以确保 L 从一个新的四块开始.

您可以使用格式字符串开头的特殊字符来修改对齐方式(以及字节顺序).在您的情况下,您可以使用 =QHL 完全禁用对齐:

QQ QQ |QQ QQ |HH LL |二

当我使用 little-endian 顺序时,问题似乎会自行纠正并使二进制字符串更短.由于这是针对 SSL 包装的 TCP 套接字的,所以这是双赢,对吧?保持低带宽通常是好的,是吗?

使用显式字节顺序也会禁用对齐 是,这就是效果的来源.不过,如果改变对齐方式是个好主意,则视情况而定.如果你想在其他地方使用你的数据,在其他程序中,最好坚持原生对齐.

I'm trying to design a system to react to different binary flags.

0 = Error
1 = Okay
2 = Logging
3 = Number

The sequence of this data represents a unique ID to reference the work, the flag and the number. Everything works, except the number flag. This is what I get...

>>> import struct
>>> data = (1234, 3, 12345678)
>>> bin = struct.pack('QHL', *data)
>>> print(bin)
b'\xd2\x04\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00Na\xbc\x00\x00\x00\x00\x00'
>>> result = struct.unpack_from('QH', bin, 0)
>>> print(result)
(1234, 3)
>>> offset = struct.calcsize('QH')
>>> result += struct.unpack_from('L', bin, offset)
>>> print(result)
(1234, 3, 7011541669862440960)

A long should be plenty big to represent the number 12345678, but why is it incorrectly unpacked?

Edit:

When I try to pack them separately, it looks like struct is adding too many null bytes between the flag and the long.

>>> import struct
>>> struct.pack('QH', 1234, 3)
b'\xd2\x04\x00\x00\x00\x00\x00\x00\x03\x00'
>>> struct.pack('L', 12345678)
b'Na\xbc\x00\x00\x00\x00\x00'

I can reproduce this error by adding padding before the long.

>>> struct.unpack('L', struct.pack('L', 12345678))
(12345678,)
>>> struct.unpack('xL', struct.pack('xL', 12345678))
(12345678,)
>>> struct.pack('xL', 12345678)
b'\x00\x00\x00\x00\x00\x00\x00\x00Na\xbc\x00\x00\x00\x00\x00'

Potential fix?

When I use little-endian order, the problem seems to correct itself and make the binary string shorter. Since this is destined for a SSL wrapped TCP socket, that's a win win, right? Keeping bandwidth low is generally good, yes?

>>> import struct
>>> data = (1234, 3, 12345678)
>>> bin = struct.pack('<QHL', *data)
>>> print(bin)
b'\xd2\x04\x00\x00\x00\x00\x00\x00\x03\x00Na\xbc\x00'
>>> result = struct.unpack_from('<QH', bin, 0)
>>> print(result)
(1234, 3)
>>> offset = struct.calcsize('<QH')
>>> result += struct.unpack_from('<L', bin, offset)
>>> print(result)
(1234, 3, 12345678)

Why does this happen? I am perplexed.

解决方案

You are running into byte alignment issues. You need to know that by default the individual parts of a struct are not just placed next to each other but they are properly aligned in memory. This makes it more efficient, especially for other applications, as they have more direct way to access individual bytes from it without having to account for overlap.

You can easily see this by using struct.calcsize to see the required space needed to encode using a format:

>>> struct.calcsize('QHL')
16
>>> struct.calcsize('QH')
10

As you can see QHL requires 16 bytes, but QH requires 10. The L we left off is however only 4 bytes wide. So there is some padding going to on make sure that the L starts again on "a fresh block". This is because any type requires (with padding) that it starts on a offset that is a multiple of its own size. For QH it looks like this:

QQ QQ | QQ QQ | HH

Once you use QHL, you get the following:

QQ QQ | QQ QQ | HH 00 | LL LL

As you can see, there were two padding bytes added to make sure that L starts on a new block of four.

You can modify the alignment (as well as the endianness) using a special character at the beginning of the format string. In your case, you could use =QHL to disable alignment altogether:

QQ QQ | QQ QQ | HH LL | LL


Using an explicit byte order also disables alignment yes, so that’s where the effect comes from. If it’s a good idea to turn of alignment depends though. If you want to use consume your data somewhere else, in other programs, it would be a good idea to stick to native alignment.

这篇关于Python结构错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-03 06:23