因此,我运行下面的代码,当我运行它后使用queue.qsize()时,队列中仍然有450,000左右的项目,这意味着未读取文本文件的大多数行。知道这里发生了什么吗?
from Queue import Queue
from threading import Thread
lines = 660918 #int(str.split(os.popen('wc -l HGDP_FinalReport_Forward.txt').read())[0]) -1
queue = Queue()
File = 'HGDP_FinalReport_Forward.txt'
num_threads =10
short_file = open(File)
class worker(Thread):
def __init__(self,queue):
Thread.__init__(self)
self.queue = queue
def run(self):
while True:
try:
self.queue.get()
i = short_file.readline()
self.queue.task_done() #signal to the queue that the task is done
except:
break
## This is where I should make the call to the threads
def main():
for i in range(num_threads):
worker(queue).start()
queue.join()
for i in range(lines): # put the range of the number of lines in the .txt file
queue.put(i)
main()
最佳答案
很难确切知道您要在此处执行的操作,但是如果每行都可以独立处理,则multiprocessing
是一个简单得多的选择,它将为您解决所有同步问题。另外一个好处是您不必事先知道行数。
基本上,
import multiprocessing
pool = multiprocessing.Pool(10)
def process(line):
return len(line) #or whatever
with open(path) as lines:
results = pool.map(process, lines)
或者,如果您只是想从行中获取某种汇总结果,则可以使用
reduce
来降低内存使用量。import operator
with open(path) as lines:
result = reduce(operator.add, pool.map(process, lines))
关于python - Readline和线程,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/10919327/