问题描述
我有一个很大的二进制文件,我想读取并使用struct.unpack()解包
文件由多个行组成,每行长度为2957字节。
我使用下面的代码读取文件:
pre $ with open(bin_file,rb)as f:
line = f.read(2957)
我的问题是为什么大小返回:
import sys
sys.getsizeof(line)
解决方案您误解了
sys.getsizeof()
的作用。它返回Python用于字符串对象的内存量,而不是行的长度。
Python字符串对象跟踪引用计数,对象类型和其他元数据以及实际的字符,所以2978个字节不是与字符串长度相同的。
请参阅:
typedef struct {
PyObject_VAR_HEAD
long ob_shash;
int ob_sstate;
char ob_sval [1];
/ *不变量:
* ob_sval包含ob_size + 1元素的空间。
* ob_sval [ob_size] == 0.
* ob_shash是字符串的散列,如果还没有计算,则为-1。
* ob_sstate!= 0 iff string对象在stringobject.c的
*'interned'字典中;在这种情况下,从interned到这个对象的两个引用
*在ob_refcnt中不计入*。
* /
} PyStringObject;
其中
PyObject_VAR_HEAD
在,其中标准ob_refcnt
,ob_type
和ob_size
字段全部被定义。
所以一个长度为2957的字符串需要2958个字节(字符串长度为空),其余的20个字节是引用计数,类型的指针,对象的大小(字符串长度在这里),缓存的字符串哈希和interned状态标志。
其他对象类型将有不同的内存占用,和所使用的C类型的确切尺寸也因平台而异。
I have a large binary file that I would like to read in and unpack using struct.unpack()The file consists of a number of lines each 2957 bytes long.I read in the file using the following code:
with open("bin_file", "rb") as f: line = f.read(2957)
My question is why, is the size returned by:
import sys sys.getsizeof(line)
not equal to 2957 (in my case it is 2978)?
解决方案You misunderstand what
sys.getsizeof()
does. It returns the amount of memory Python uses for a string object, not length of the line.Python string objects track reference counts, the object type and other metadata together with the actual characters, so 2978 bytes is not the same thing as the string length.
See the
stringobject.h
definition of the type:typedef struct { PyObject_VAR_HEAD long ob_shash; int ob_sstate; char ob_sval[1]; /* Invariants: * ob_sval contains space for 'ob_size+1' elements. * ob_sval[ob_size] == 0. * ob_shash is the hash of the string or -1 if not computed yet. * ob_sstate != 0 iff the string object is in stringobject.c's * 'interned' dictionary; in this case the two references * from 'interned' to this object are *not counted* in ob_refcnt. */ } PyStringObject;
where
PyObject_VAR_HEAD
is defined inobject.h
, where the standardob_refcnt
,ob_type
andob_size
fields are all defined.So a string of length 2957 takes 2958 bytes (string length + null) and the remaining 20 bytes you see are to hold the reference count, the type pointer, the object 'size' (string length here), the cached string hash and the interned state flag.
Other object types will have different memory footprints, and the exact sizes of the C types used differ from platform to platform as well.
这篇关于为什么sys.getsizeof()不能在Python中的file.read([size])中返回[size]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
07-29 23:55