问题描述
在应用程序引擎上,我得到了超过软私人内存限制
错误,并确实检查 memory_usage()。current()
确认问题。请参阅下面的日志语句输出。似乎每次获取一批foos时,内存都会增加。
我的问题是:为什么内存不被垃圾收集?我期望,在循环的每次迭代中(分别为 while和/或c $ c>循环,
为
循环)名称 foos
和 foo
的重用会导致 foos用于指向的
和 foo
会被'取消引用'(即变为不可访问),因此有资格进行垃圾回收,然后进行垃圾回收因为记忆变得紧张。但显然它没有发生。
from google.appengine.api.runtime import memory_usage
batch_size = 10
dict_of_results = {}
results = 0
cursor = None
while:
foos = models.Foo.all()。filter('status =',6 )
如果光标:
foos.with_cursor(游标)
$ b $ for foo在foos.run(batch_size = batch_size):
logging.debug( 'on result#{} used memory of {}'。format(results,memory_usage()。current()))
results + = 1
ar = some_module.get_bar(foo)
if bar:
try:
dict_of_results [bar.baz] + = 1
除了KeyError:
dict_of_results [bar.baz] = 1
如果结果> = batch_size:
cursor = foos.cursor()
break
else:
break
和some_module.py
def get_bar(foo):
for foo.bars中的bar:
if ba r.status == 10:
返回栏
返回无
对结果#1输出logging.debug(缩写)
在结果#2上使用了43
的内存使用内存43
.....
对结果#20使用内存43
对结果#21使用内存49
.....
在结果#32上使用内存49
对结果#33使用内存54
.....
对结果#44使用内存54
对结果#45使用内存59
.....
对结果#55使用内存59
.....
.....
.... 。
结果#597使用284.3
的内存超过256 MB的软内存限制,313 MB后超过1个请求总数
看起来你的批处理解决方案与db的批处理冲突,导致大量额外的批处理。当你运行 query.run(batch_size = batch_size)
时,db会运行查询u完成整个限制。当你到达批处理结束时,db将抓取下一批。然而,在db完成之后,你退出循环并重新开始。这意味着批次1 - > n将全部在内存中存在两次。一次为最后一次查询获取,一次为您的下一次查询获取。
如果您想循环所有实体,只需让db处理批处理:
$ $ p $
foos = models.Foo.all()。filter('status =',6)
for foo in foos.run batch_size = batch_size):
results + = 1
bar = some_module.get_bar(foo)
if bar:
try:
dict_of_results [bar.baz] + = 1
除KeyError:
dict_of_results [bar.baz] = 1
或,如果你想自己处理批处理,请确保db不会执行任何批处理:
while True:
如果使用光标:
foo_query.with_cursor(游标)
foos = foo_query.fetch(limit = batch_size)foo_query = models.Foo.all()。filter('status =',6)
如果不是foos:
break
cursor = foos.cursor()
I have some code that iterates over DB entities, and runs in a task - see below.
On app engine I'm getting Exceeded soft private memory limit
error, and indeed checking memory_usage().current()
confirms the problem. See below for output from logging statement. It seems that every time a batch of foos is fetched the memory goes up.
My question is: why is the memory not being garbage collected? I would expect, that in each iteration of of the loops (the while
loop, and the for
loop, respectively) the re-use of the name foos
and the foo
would cause the objects to which foos
and foo
used to point would be 'de-referenced' (i.e. become inaccessible) and therefore become eligible for garbage collection, and then be garbage collected as memory gets tight. But evidently that it not happening.
from google.appengine.api.runtime import memory_usage
batch_size = 10
dict_of_results = {}
results = 0
cursor = None
while True:
foos = models.Foo.all().filter('status =', 6)
if cursor:
foos.with_cursor(cursor)
for foo in foos.run(batch_size = batch_size):
logging.debug('on result #{} used memory of {}'.format(results, memory_usage().current()))
results +=1
bar = some_module.get_bar(foo)
if bar:
try:
dict_of_results[bar.baz] += 1
except KeyError:
dict_of_results[bar.baz] = 1
if results >= batch_size:
cursor = foos.cursor()
break
else:
break
and in some_module.py
def get_bar(foo):
for bar in foo.bars:
if bar.status == 10:
return bar
return None
Output of logging.debug (shortened)
on result #1 used memory of 43
on result #2 used memory of 43
.....
on result #20 used memory of 43
on result #21 used memory of 49
.....
on result #32 used memory of 49
on result #33 used memory of 54
.....
on result #44 used memory of 54
on result #45 used memory of 59
.....
on result #55 used memory of 59
.....
.....
.....
on result #597 used memory of 284.3
Exceeded soft private memory limit of 256 MB with 313 MB after servicing 1 requests total
It looks like your batch solution is conflicting with db's batching, resulting in a lot of extra batches hanging around.
When you run query.run(batch_size=batch_size)
, db will run the query until completion of the entire limit. When you reach the end of the batch, db will grab the next batch. However, right after db does this, you exit the loop and start again. What this means is that batches 1 -> n will all exist in memory twice. Once for the last queries fetch, once for your next queries fetch.
If you want to loop over all your entities, just let db handle the batching:
foos = models.Foo.all().filter('status =', 6)
for foo in foos.run(batch_size = batch_size):
results +=1
bar = some_module.get_bar(foo)
if bar:
try:
dict_of_results[bar.baz] += 1
except KeyError:
dict_of_results[bar.baz] = 1
Or, if you want to handle batching yourself, make sure db doesn't do any batching:
while True:
foo_query = models.Foo.all().filter('status =', 6)
if cursor:
foo_query.with_cursor(cursor)
foos = foo_query.fetch(limit=batch_size)
if not foos:
break
cursor = foos.cursor()
这篇关于迭代db结果时,如何在应用程序引擎(python)中收集内存垃圾的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!