用Python解析C结构

用Python解析C结构

本文介绍了用Python解析C结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我确定这是完全错误的,并且遇到了一些问题。我已经将一个 WIN32_FIND_DATAW 结构数组一个接一个地写到磁盘上,我想在Python脚本中使用并解析它们。

I'm sure this is terribly wrong, and I'm having a couple of problems. I've written out an array of WIN32_FIND_DATAW structures to disk, one after another, and I'd like to consume and parse them in my Python script.

我当前使用的代码是:

>>> fp = open('findData', 'r').read()
>>> data = ctypes.cast(fp, ctypes.POINTER(wintypes.WIN32_FIND_DATAW))
>>> print str(data[0].cFileName)

第一个问题是第三行没有像我期望的那样打印一个不错的字符串。而不是打印 $ Recycle.Bin 而是打印 UnicodeEncodeError:'ascii'编解码器无法在位置0-5处编码字符:序数不在范围内( 128)

The first problem is that the third line doesn't print a nice string like I would expect. Instead of printing $Recycle.Bin it prints UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)

这只是打印存储在其中的数据的结果:

This is the result of just printing the data stored there:

>>> data[0].cFileName
u'\U00520024\U00630065\U00630079\U0065006c\U0042002e\U006e0069'

这看起来相对合理。 $ 是ASCII 0x24, R 是ASCII 0x52,依此类推。

This looks relatively reasonable. $ is ASCII 0x24, R is ASCII 0x52 and so on.

那我为什么不能像字符串一样打印它?

我的第二个问题是:

>>> data[1].cFileName

给我可笑的数据。我相当确定我没有正确使用 ctypes.cast 。我应该怎么做才能访问这些?为了澄清,在C语言中,我只是将 PWIN32_FIND_DATAW 指针指向缓冲区的开头,并使用类似的代码访问数组中的各个结构,我正在尝试

Gives me ridiculous data. I'm fairly sure I'm not using that ctypes.cast correctly. How should I be doing it to access these? To clarify, in C, I'd just point a PWIN32_FIND_DATAW pointer to the beginning of the buffer and access the individual structs in the array using similar code, and I'm trying to do the same in Python.

更新

这样做:

>>> data[0].cFileName.encode('windows-1252')

产生此错误:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-5: character maps to <undefined>

更新

Update

第一个条目的开始( data [0] 直到cFileName的第一部分)如下所示:

The beginning of the first entry (data[0] up to the first part of cFileName) looks like the following:

user@ubuntu:~/data$ hexdump -C findData | head -n 6
00000000  16 00 00 00 dc 5a 9f d2  31 04 ca 01 ba 81 89 1a  |.....Z..1.......|
00000010  81 e2 cd 01 ba 81 89 1a  81 e2 cd 01 00 00 00 00  |................|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 24 00 52 00  |............$.R.|
00000030  65 00 63 00 79 00 63 00  6c 00 65 00 2e 00 42 00  |e.c.y.c.l.e...B.|
00000040  69 00 6e 00 00 00 00 00  00 00 00 00 00 00 00 00  |i.n.............|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

如果需要,我可以发布更多数据。

I can post more data if needed.

推荐答案

正如评论中已经提到的,这是由于Windows和Linux之间的差异所致。 ctypes 模块试图适应本地环境,因此不匹配。最好的解决方案是使用 struct 模块以与平台无关的方式处理它。下面的代码显示了如何针对单个记录完成此操作。

As already mentioned in the comments, this is due to differences between windows and linux. The ctypes module tries to fit into the local environment, hence the mismatch. The best solution is to use the struct module to handle it in a platform independent manner. The following code shows how this can be done for a single record.

# Setup test data based on incomplete sample
bytes = "\x16\x00\x00\x00\xdc\x5a\x9f\xd2\x31\x04\xca\x01\xba\x81\x89\x1a\x81\xe2\xcd\x01\xba\x81\x89\x1a\x81\xe2\xcd\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x24\x00\x52\x00\x65\x00\x63\x00\x79\x00\x63\x00\x6c\x00\x65\x00\x2e\x00\x42\x00\x69\x00\x6e\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
bytes = bytes + "\x00"*(592-len(bytes))

import struct
import codecs

# typedef struct _WIN32_FIND_DATA {
#   DWORD    dwFileAttributes;
#   FILETIME ftCreationTime;
#   FILETIME ftLastAccessTime;
#   FILETIME ftLastWriteTime;
#   DWORD    nFileSizeHigh;
#   DWORD    nFileSizeLow;
#   DWORD    dwReserved0;
#   DWORD    dwReserved1;
#   TCHAR    cFileName[MAX_PATH];
#   TCHAR    cAlternateFileName[14];


fmt = "<L3Q4L520s28s"

attrs, creation, access, write, sizeHigh, sizeLow, reserved0, reserved1, name, alternateName = struct.unpack(fmt, bytes)
name = codecs.utf_16_le_decode(name)[0].strip('\x00')
alternateName = codecs.utf_16_le_decode(alternateName)[0].strip('\x00')
print name

注意:这假设MAX_PATH的大小为260(

NOTE: This assumes that the size of MAX_PATH is 260 (which should be true, but you never know).

要从文件中读取所有值,您需要一次读取592字节的块,然后如上所述进行解码

To read all values from the file you need to read blocks of 592 bytes at a time and then decode it as above.

这篇关于用Python解析C结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-25 05:52