问题描述
我有一个包含 250,000 多行的非常大的表格,其中许多行在其中一列中包含一个大文本块.现在它是 2.7GB,预计至少会增长十倍.我需要对表的每一行执行 python 特定的操作,但一次只需要访问一行.
I have a very large table with 250,000+ rows, many containing a large text block in one of the columns. Right now it's 2.7GB and expected to grow at least tenfold. I need to perform python specific operations on every row of the table, but only need to access one row at a time.
现在我的代码看起来像这样:
Right now my code looks something like this:
c.execute('SELECT * FROM big_table')
table = c.fetchall()
for row in table:
do_stuff_with_row
当表较小时,这工作正常,但是当我尝试运行它时,该表现在大于我可用的 ram 并且 python 挂起.有没有更好的(更高效的)方法来逐行迭代整个表?
This worked fine when the table was smaller, but the table is now larger than my available ram and python hangs when I try and run it. Is there a better (more ram efficient) way to iterate row by row over the entire table?
推荐答案
cursor.fetchall()
首先将所有结果提取到列表中.
cursor.fetchall()
fetches all results into a list first.
相反,您可以遍历游标本身:
Instead, you can iterate over the cursor itself:
c.execute('SELECT * FROM big_table')
for row in c:
# do_stuff_with_row
这会根据需要生成行,而不是先加载所有行.
This produces rows as needed, rather than load them all first.
这篇关于Python3 - 有没有办法在非常大的 SQlite 表上逐行迭代而不将整个表加载到本地内存中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!