python - 如何在Python中并行化I/O绑定(bind)的操作？

我正在用tweepy处理推文:

class StdOutListener(StreamListener):
    def on_data(self, data):
        process(json.loads(data))
        return True

l = StdOutListener()
stream = Stream(auth, l)
stream.filter(track=utf_words)

process函数获取包含在tweet中的URL内容(带有请求)，使用nltk处理数据(我猜这会占用一些CPU)，然后将结果保存到Mongo中。

问题是获取包含的URL的内容需要花费很长时间，因此限制了我的处理速度。我如何以Python的方式加快速度？

最佳答案

您可以使用python的threading模块:

import threading

class YourThreadSubclass(threading.Thread):
    def __init__(self,your_args):
        threading.Thread.__init__(self)
        #do whatever setup you want

    def run(self):
        process_data(self.some_property)

threads = [YourThreadSubclass(args) for args in Iterable]
for t in threads:
    t.start()
for t in threads:
    t.join()
return reduce(combiner, (t.result_field for t in threads))

更多信息在这里:http://docs.python.org/2/library/threading.html
编辑:更直接地，每当调用on_data时，您都可以派生一个线程。

def on_data(self, data):
    YourThreadSubclass(data).start()

fork 的线程将异步存储其结果。
如果要处理大量请求，则可能还需要使用线程池来管理线程。文件here

关于python - 如何在Python中并行化I/O绑定(bind)的操作？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/18818967/