问题描述
我确定这是完全错误的,并且遇到了一些问题。我已经将一个 WIN32_FIND_DATAW
结构数组一个接一个地写到磁盘上,我想在Python脚本中使用并解析它们。
I'm sure this is terribly wrong, and I'm having a couple of problems. I've written out an array of WIN32_FIND_DATAW
structures to disk, one after another, and I'd like to consume and parse them in my Python script.
我当前使用的代码是:
>>> fp = open('findData', 'r').read()
>>> data = ctypes.cast(fp, ctypes.POINTER(wintypes.WIN32_FIND_DATAW))
>>> print str(data[0].cFileName)
第一个问题是第三行没有像我期望的那样打印一个不错的字符串。而不是打印 $ Recycle.Bin
而是打印 UnicodeEncodeError:'ascii'编解码器无法在位置0-5处编码字符:序数不在范围内( 128)
The first problem is that the third line doesn't print a nice string like I would expect. Instead of printing $Recycle.Bin
it prints UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)
这只是打印存储在其中的数据的结果:
This is the result of just printing the data stored there:
>>> data[0].cFileName
u'\U00520024\U00630065\U00630079\U0065006c\U0042002e\U006e0069'
这看起来相对合理。 $
是ASCII 0x24, R
是ASCII 0x52,依此类推。
This looks relatively reasonable. $
is ASCII 0x24, R
is ASCII 0x52 and so on.
那我为什么不能像字符串一样打印它?
我的第二个问题是:
>>> data[1].cFileName
给我可笑的数据。我相当确定我没有正确使用 ctypes.cast
。我应该怎么做才能访问这些?为了澄清,在C语言中,我只是将 PWIN32_FIND_DATAW
指针指向缓冲区的开头,并使用类似的代码访问数组中的各个结构,我正在尝试
Gives me ridiculous data. I'm fairly sure I'm not using that ctypes.cast
correctly. How should I be doing it to access these? To clarify, in C, I'd just point a PWIN32_FIND_DATAW
pointer to the beginning of the buffer and access the individual structs in the array using similar code, and I'm trying to do the same in Python.
更新
这样做:
>>> data[0].cFileName.encode('windows-1252')
产生此错误:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-5: character maps to <undefined>
更新
Update
第一个条目的开始( data [0]
直到cFileName的第一部分)如下所示:
The beginning of the first entry (data[0]
up to the first part of cFileName) looks like the following:
user@ubuntu:~/data$ hexdump -C findData | head -n 6
00000000 16 00 00 00 dc 5a 9f d2 31 04 ca 01 ba 81 89 1a |.....Z..1.......|
00000010 81 e2 cd 01 ba 81 89 1a 81 e2 cd 01 00 00 00 00 |................|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 24 00 52 00 |............$.R.|
00000030 65 00 63 00 79 00 63 00 6c 00 65 00 2e 00 42 00 |e.c.y.c.l.e...B.|
00000040 69 00 6e 00 00 00 00 00 00 00 00 00 00 00 00 00 |i.n.............|
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
如果需要,我可以发布更多数据。
I can post more data if needed.
推荐答案
正如评论中已经提到的,这是由于Windows和Linux之间的差异所致。 ctypes
模块试图适应本地环境,因此不匹配。最好的解决方案是使用 struct
模块以与平台无关的方式处理它。下面的代码显示了如何针对单个记录完成此操作。
As already mentioned in the comments, this is due to differences between windows and linux. The ctypes
module tries to fit into the local environment, hence the mismatch. The best solution is to use the struct
module to handle it in a platform independent manner. The following code shows how this can be done for a single record.
# Setup test data based on incomplete sample
bytes = "\x16\x00\x00\x00\xdc\x5a\x9f\xd2\x31\x04\xca\x01\xba\x81\x89\x1a\x81\xe2\xcd\x01\xba\x81\x89\x1a\x81\xe2\xcd\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x24\x00\x52\x00\x65\x00\x63\x00\x79\x00\x63\x00\x6c\x00\x65\x00\x2e\x00\x42\x00\x69\x00\x6e\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
bytes = bytes + "\x00"*(592-len(bytes))
import struct
import codecs
# typedef struct _WIN32_FIND_DATA {
# DWORD dwFileAttributes;
# FILETIME ftCreationTime;
# FILETIME ftLastAccessTime;
# FILETIME ftLastWriteTime;
# DWORD nFileSizeHigh;
# DWORD nFileSizeLow;
# DWORD dwReserved0;
# DWORD dwReserved1;
# TCHAR cFileName[MAX_PATH];
# TCHAR cAlternateFileName[14];
fmt = "<L3Q4L520s28s"
attrs, creation, access, write, sizeHigh, sizeLow, reserved0, reserved1, name, alternateName = struct.unpack(fmt, bytes)
name = codecs.utf_16_le_decode(name)[0].strip('\x00')
alternateName = codecs.utf_16_le_decode(alternateName)[0].strip('\x00')
print name
注意:这假设MAX_PATH的大小为260(
NOTE: This assumes that the size of MAX_PATH is 260 (which should be true, but you never know).
要从文件中读取所有值,您需要一次读取592字节的块,然后如上所述进行解码
To read all values from the file you need to read blocks of 592 bytes at a time and then decode it as above.
这篇关于用Python解析C结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!