问题描述
在使用python脚本发出大型 solr 查询时,我遇到了一些内存问题.我正在使用 solrpy 库与solr服务器交互.该查询返回大约80,000条记录.发出查询后,通过顶部气球查看的python内存占用立即达到190MB.
I'm having some memory issues while using a python script to issue a large solr query. I'm using the solrpy library to interface with the solr server. The query returns approximately 80,000 records. Immediately after issuing the query the python memory footprint as viewed through top balloons to ~190MB.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8225 root 16 0 193m 189m 3272 S 0.0 11.2 0:11.31 python
...
此时,通过堆查看的堆配置文件如下所示:
At this point, the heap profile as viewed through heapy looks like this:
Partition of a set of 163934 objects. Total size = 14157888 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 80472 49 7401384 52 7401384 52 unicode
1 44923 27 3315928 23 10717312 76 str
...
unicode对象表示查询中记录的唯一标识符.需要注意的一件事是,当python占用190MB物理内存时,总堆大小仅为14MB.一旦存储查询结果的变量超出范围,堆配置文件将正确反映垃圾回收:
The unicode objects represent the unique identifiers of the records from the query. One thing to note is that the total heap size is only 14MB while python is occupying 190MB of physical memory. Once the variable storing the query results falls out of scope, the heap profile correctly reflects the garbage collection:
Partition of a set of 83586 objects. Total size = 6437744 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 44928 54 3316108 52 3316108 52 str
但是,内存占用量保持不变:
However, the memory footprint remains unchanged:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8225 root 16 0 195m 192m 3432 S 0.0 11.3 0:13.46 python
...
为什么python的物理内存占用量和python堆的大小之间有如此大的差异?
Why is there such a large disparity between python's physical memory footprint and the size of the python heap?
推荐答案
Python从C堆分配Unicode对象.因此,当您分配它们中的许多(连同其他malloc块),然后释放除最后一个块之外的大多数块时,C malloc不会将任何内存返回给操作系统,因为C堆只会在末尾收缩(不在中间).释放最后一个Unicode对象将在C堆的末尾释放该块,然后允许malloc将其全部返回给系统.
Python allocates Unicode objects from the C heap. So when you allocate many of them (along with other malloc blocks), then release most of them except for the very last one, C malloc will not return any memory to the operating system, as the C heap will only shrink on the end (not in the middle). Releasing the last Unicode object will release the block at the end of the C heap, which then allows malloc to return it all to the system.
除了这些问题之外,Python还维护了一组释放的unicode对象,以加快分配速度.因此,当最后一个Unicode对象释放后,它不会立即返回到malloc,从而使所有其他页面都卡住了.
On top of these problems, Python also maintains a pool of freed unicode objects, for faster allocation. So when the last Unicode object is freed, it isn't returned to malloc right away, making all the other pages stuck.
这篇关于Python内存占用量与堆大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!