本文介绍了为什么sys.getsizeof()不能在Python中的file.read([size])中返回[size]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的二进制文件,我想读取并使用struct.unpack()解包
文件由多个行组成,每行长度为2957字节。
我使用下面的代码读取文件:

pre $ with open(bin_file,rb)as f:
line = f.read(2957)

我的问题是为什么大小返回:

  import sys 
sys.getsizeof(line)


解决方案

您误解了 sys.getsizeof() 的作用。它返回Python用于字符串对象的内存量,而不是行的长度。



Python字符串对象跟踪引用计数,对象类型和其他元数据以及实际的字符,所以2978个字节不是与字符串长度相同的。



请参阅:

typedef struct {
PyObject_VAR_HEAD
long ob_shash;
int ob_sstate;
char ob_sval [1];

/ *不变量:
* ob_sval包含ob_size + 1元素的空间。
* ob_sval [ob_size] == 0.
* ob_shash是字符串的散列,如果还没有计算,则为-1。
* ob_sstate!= 0 iff string对象在stringobject.c的
*'interned'字典中;在这种情况下,从interned到这个对象的两个引用
*在ob_refcnt中不计入*。
* /
} PyStringObject;

其中 PyObject_VAR_HEAD 在,其中标准 ob_refcnt ob_type ob_size 字段全部被定义。

所以一个长度为2957的字符串需要2958个字节(字符串长度为空),其余的20个字节是引用计数,类型的指针,对象的大小(字符串长度在这里),缓存的字符串哈希和interned状态标志。

其他对象类型将有不同的内存占用,和所使用的C类型的确切尺寸也因平台而异。


I have a large binary file that I would like to read in and unpack using struct.unpack()The file consists of a number of lines each 2957 bytes long.I read in the file using the following code:

with open("bin_file", "rb") as f:
    line = f.read(2957)

My question is why, is the size returned by:

import sys
sys.getsizeof(line)

not equal to 2957 (in my case it is 2978)?

解决方案

You misunderstand what sys.getsizeof() does. It returns the amount of memory Python uses for a string object, not length of the line.

Python string objects track reference counts, the object type and other metadata together with the actual characters, so 2978 bytes is not the same thing as the string length.

See the stringobject.h definition of the type:

typedef struct {
    PyObject_VAR_HEAD
    long ob_shash;
    int ob_sstate;
    char ob_sval[1];

    /* Invariants:
     *     ob_sval contains space for 'ob_size+1' elements.
     *     ob_sval[ob_size] == 0.
     *     ob_shash is the hash of the string or -1 if not computed yet.
     *     ob_sstate != 0 iff the string object is in stringobject.c's
     *       'interned' dictionary; in this case the two references
     *       from 'interned' to this object are *not counted* in ob_refcnt.
     */
} PyStringObject;

where PyObject_VAR_HEAD is defined in object.h, where the standard ob_refcnt, ob_type and ob_size fields are all defined.

So a string of length 2957 takes 2958 bytes (string length + null) and the remaining 20 bytes you see are to hold the reference count, the type pointer, the object 'size' (string length here), the cached string hash and the interned state flag.

Other object types will have different memory footprints, and the exact sizes of the C types used differ from platform to platform as well.

这篇关于为什么sys.getsizeof()不能在Python中的file.read([size])中返回[size]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-29 23:55