本文介绍了在Python中,是否有一个异步于multiprocessing或current.futures?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基本上,我正在寻找一种使用python3协程而不是线程或进程提供并行映射的东西.我相信执行高度并行的IO工作时应该减少开销.

Basically, I'm looking for something that offers a parallel map using python3 coroutines as the backend instead of threads or processes. I believe there should be less overhead when performing highly parallel IO work.

肯定是已经存在类似的东西了吗,无论是在标准库中还是在广泛使用的软件包中?

Surely something similar already exists, be it in the standard library or some widely used package?

推荐答案

免责声明 PEP 0492 仅定义了协程的语法和用法.它们需要运行事件循环,很可能是 asyncio的事件循环.

DISCLAIMER PEP 0492 defines only syntax and usage for coroutines. They require an event loop to run, which is most likely asyncio's event loop.

我不知道任何基于协程的map实现.但是,使用 asyncio.gather() :

I don't know any implementation of map based on coroutines. However it's trivial to implement basic map functionality using asyncio.gather():

def async_map(coroutine_func, iterable):
    loop = asyncio.get_event_loop()
    future = asyncio.gather(*(coroutine_func(param) for param in iterable))
    return loop.run_until_complete(future)

此实现非常简单.它为iterable中的每个项目创建一个协程,将它们合并为单个协程,并在事件循环中执行已合并的协程.

This implementation is really simple. It creates a coroutine for each item in the iterable, joins them into single coroutine and executes joined coroutine on event loop.

提供的实现涵盖部分案例.但是,这有一个问题.如果长期迭代,您可能希望限制并行运行的协程数量.我无法提出简单的实现方式,该实现方式高效并且可以同时保留顺序,因此我将其留给读者练习.

Provided implementation covers part of the cases. However it has a problem. With long iterable you would probably want to limit amount of coroutines running in parallel. I can't come up with simple implementation, which is efficient and preserves order at the same time, so I will leave it as an exercise for a reader.

您声称:

这需要证明,因此这里比较了multiprocessing实现和 ap 和我的gevent实现基于协程的实现.所有测试均在Python 3.5上执行.

It requires proof, so here is a comparison of multiprocessing implementation, gevent implementation by a p and my implementation based on coroutines. All tests were performed on Python 3.5.

使用multiprocessing的实现:

from multiprocessing import Pool
import time


def async_map(f, iterable):
    with Pool(len(iterable)) as p:  # run one process per item to measure overhead only
        return p.map(f, iterable)

def func(val):
    time.sleep(1)
    return val * val

使用gevent的实现:

import gevent
from gevent.pool import Group


def async_map(f, iterable):
    group = Group()
    return group.map(f, iterable)

def func(val):
    gevent.sleep(1)
    return val * val

使用asyncio的实现:

import asyncio


def async_map(f, iterable):
    loop = asyncio.get_event_loop()
    future = asyncio.gather(*(f(param) for param in iterable))
    return loop.run_until_complete(future)

async def func(val):
    await asyncio.sleep(1)
    return val * val

通常使用测试程序timeit:

$ python3 -m timeit -s 'from perf.map_mp import async_map, func' -n 1 'async_map(func, list(range(10)))'

结果:

  1. 10个项目的可迭代项:

  • multiprocessing-1.05秒
  • gevent-1秒
  • asyncio-1秒
  • multiprocessing - 1.05 sec
  • gevent - 1 sec
  • asyncio - 1 sec

100个项目的可迭代项:

  • multiprocessing-1.16秒
  • gevent-1.01秒
  • asyncio-1.01秒
  • multiprocessing - 1.16 sec
  • gevent - 1.01 sec
  • asyncio - 1.01 sec

500个项目的可迭代项:

  • multiprocessing-2.31秒
  • gevent-1.02秒
  • asyncio-1.03秒
  • multiprocessing - 2.31 sec
  • gevent - 1.02 sec
  • asyncio - 1.03 sec

5000个项目的可迭代项:

  • multiprocessing-失败(产生5k进程不是一个好主意!)
  • gevent-1.12秒
  • asyncio-1.22秒
  • multiprocessing - failed (spawning 5k processes is not so good idea!)
  • gevent - 1.12 sec
  • asyncio - 1.22 sec

50000个项目的可迭代项:

  • gevent-2.2秒
  • asyncio-3.25秒
  • gevent - 2.2 sec
  • asyncio - 3.25 sec

结论

当程序主要执行I/O而不是执行计算时,基于事件循环的并发工作速度更快.请记住,当I/O更少且涉及更多计算时,这种差异将更小.

Conclusions

Concurrency based on event loop works faster, when program do mostly I/O, not computations. Keep in mind, that difference will be smaller, when there are less I/O and more computations are involved.

与基于事件循环的并发引入的开销相比,生成过程引入的开销要大得多.这意味着您的假设是正确的.

Overhead introduced by spawning processes is significantly bigger, than overhead introduced by event loop based concurrency. It means that your assumption is correct.

比较asynciogevent,我们可以说asyncio的开销大33-45%.这意味着,创建绿色小子比创建协程便宜.

Comparing asyncio and gevent we can say, that asyncio has 33-45% bigger overhead. It means that creation of greenlets is cheaper, than creation of coroutines.

最后的结论:gevent具有更好的性能,但是asyncio是标准库的一部分.性能差异(绝对数字)不是很明显. gevent是一个相当成熟的库,而asyncio是一个相对较新的库,但是它进步很快.

As a final conclusion: gevent has better performance, but asyncio is part of the standard library. Difference in performance (absolute numbers) isn't very significant. gevent is quite mature library, while asyncio is relatively new, but it advances quickly.

这篇关于在Python中,是否有一个异步于multiprocessing或current.futures?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-19 16:01