问题描述
考虑以下python程序:
Consider this python program:
import sys
lc = 0
for line in open(sys.argv[1]):
lc = lc + 1
print lc, sys.argv[1]
在我的6GB文本文件上运行它,大约需要2分钟即可完成.
Running it on my 6GB text file, it completes in ~ 2minutes.
问题:有可能走得更快吗?
请注意,同一时间是必需的:
Note that the same time is required by:
wc -l myfile.txt
所以,我怀疑我的疑问只是一个简单的否".
so, I suspect the anwer to my quesion is just a plain "no".
还请注意,我的真实程序正在做的事情不仅仅是计数行,所以请给出一个通用的答案,不行计数技巧(例如将行计数元数据保存在文件中) )
Note also that my real program is doing something more interesting than just counting the lines, so please give a generic answer, not line-counting-tricks (like keeping a line count metadata in the file)
PS:我将此问题标记为"linux",因为我仅对特定于linux的答案感兴趣.如果有答案,请随时提供与操作系统无关的答案,甚至与其他操作系统无关.
PS: I tagged "linux" this question, because I'm interested only in linux-specific answers. Feel free to give OS-agnostic, or even other-OS answers, if you have them.
另请参见后续问题
推荐答案
您无法获得比最大磁盘读取速度更快的速度.
You can't get any faster than the maximum disk read speed.
为了达到最大磁盘速度,您可以使用以下两个技巧:
In order to reach the maximum disk speed you can use the following two tips:
- 使用大缓冲区读取文件.既可以手动"编码,也可以简单地使用io.BufferedReader(在python2.6 +中提供)进行编码.
- 在另一个线程中并行进行换行计数.
这篇关于是否可以加快python IO的速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!