python - Readline和线程 | Readline和线程

因此，我运行下面的代码，当我运行它后使用queue.qsize（）时，队列中仍然有450,000左右的项目，这意味着未读取文本文件的大多数行。知道这里发生了什么吗？

from Queue import Queue
from threading import Thread

lines = 660918 #int(str.split(os.popen('wc -l HGDP_FinalReport_Forward.txt').read())[0]) -1
queue = Queue()
File = 'HGDP_FinalReport_Forward.txt'
num_threads =10
short_file = open(File)

class worker(Thread):
    def __init__(self,queue):
        Thread.__init__(self)
        self.queue = queue
    def run(self):
        while True:
            try:
                self.queue.get()
                i  = short_file.readline()
                self.queue.task_done() #signal to the queue that the task is done
            except:
                break

## This is where I should make the call to the threads

def main():
    for i in range(num_threads):
        worker(queue).start()
    queue.join()


    for i in range(lines): # put the range of the number of lines in the .txt file
        queue.put(i)

main()

最佳答案

很难确切知道您要在此处执行的操作，但是如果每行都可以独立处理，则multiprocessing是一个简单得多的选择，它将为您解决所有同步问题。另外一个好处是您不必事先知道行数。

基本上，

import multiprocessing
pool = multiprocessing.Pool(10)

def process(line):
    return len(line) #or whatever

with open(path) as lines:
    results = pool.map(process, lines)

或者，如果您只是想从行中获取某种汇总结果，则可以使用reduce来降低内存使用量。

import operator
with open(path) as lines:
    result = reduce(operator.add, pool.map(process, lines))

关于python - Readline和线程，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/10919327/