


I have a python application that grabs a collection of data and for each piece of data in that collection it performs a task. The task takes some time to complete as there is a delay involved. Because of this delay, I don't want each piece of data to perform the task subsequently, I want them to all happen in parallel. Should I be using multiprocess? or threading for this operation?


I attempted to use threading but had some trouble, often some of the tasks would never actually fire.



If you are truly compute bound, using the multiprocessing module is probably the lightest weight solution (in terms of both memory consumption and implementation difficulty.)


If you are I/O bound, using the threading module will usually give you good results. Make sure that you use thread safe storage (like the Queue) to hand data to your threads. Or else hand them a single piece of data that is unique to them when they are spawned.

PyPy 专注于性能.它具有许多功能,可以帮助进行计算绑定处理.他们还支持软件事务存储,尽管这还不是生产质量.保证您可以使用比多处理(有一些尴尬的要求)更简单的并行或并发机制.

PyPy is focused on performance. It has a number of features that can help with compute-bound processing. They also have support for Software Transactional Memory, although that is not yet production quality. The promise is that you can use simpler parallel or concurrent mechanisms than multiprocessing (which has some awkward requirements.)

无堆栈Python 也是一个好主意.如上所述,Stackless具有可移植性问题. 空燕子很有前途,但现已废止. Pyston 是另一个专注于速度的(未完成的)Python实现.它采用的方法不同于PyPy,可能会产生更好的(或略有不同)加速.

Stackless Python is also a nice idea. Stackless has portability issues as indicated above. Unladen Swallow was promising, but is now defunct. Pyston is another (unfinished) Python implementation focusing on speed. It is taking an approach different to PyPy, which may yield better (or just different) speedups.


07-23 22:13