本文介绍了为什么要遍历一个消耗大量内存的大型Django QuerySet?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!



The table in question contains roughly ten million rows.

for event in Event.objects.all():
    print event

这将使内存使用量稳步增加到4 GB左右,此时行快速打印。第一行打印之前的漫长延迟使我感到惊讶 - 我预计几乎会立即打印。

This causes memory usage to increase steadily to 4 GB or so, at which point the rows print rapidly. The lengthy delay before the first row printed surprised me – I expected it to print almost instantly.

我也尝试过 Event.objects.iterator() 的行为方式相同。

I also tried Event.objects.iterator() which behaved the same way.


I don't understand what Django is loading into memory or why it is doing this. I expected Django to iterate through the results at the database level, which'd mean the results would be printed at roughly a constant rate (rather than all at once after a lengthy wait).



(I don't know whether it's relevant, but I'm using PostgreSQL.)


Nate C关闭,但不完整。

Nate C was close, but not quite.


  • 迭代。 QuerySet是可迭代的,并且在您首次迭代时执行其数据库查询。例如,这将打印数据库中所有条目的标题:

  • Iteration. A QuerySet is iterable, and it executes its database query the first time you iterate over it. For example, this will print the headline of all entries in the database:

for e in Entry.objects.all():
    print e.headline


So your ten million rows are retrieved, all at once, when you first enter that loop and get the iterating form of the queryset. The wait you experience is Django loading the database rows and creating objects for each one, before returning something you can actually iterate over. Then you have everything in memory, and the results come spilling out.


From my reading of the docs, iterator() does nothing more than bypass QuerySet's internal caching mechanisms. I think it might make sense for it to a do a one-by-one thing, but that would conversely require ten-million individual hits on your database. Maybe not all that desirable.


Iterating over large datasets efficiently is something we still haven't gotten quite right, but there are some snippets out there you might find useful for your purposes:

  • Memory Efficient Django QuerySet iterator
  • batch querysets
  • QuerySet Foreach

这篇关于为什么要遍历一个消耗大量内存的大型Django QuerySet?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-19 21:40