问题描述
不久前我写了一个 web-spider,我对它进行了多线程处理以使并发请求能够同时发生.那是在我的 Python 青年时期,在我了解 GIL 和相关它为多线程代码创建了麻烦(即,大多数情况下,东西都以序列化结束!)...
A (long) while ago I wrote a web-spider that I multithreaded to enable concurrent requests to occur at the same time. That was in my Python youth, in the days before I knew about the GIL and the associated woes it creates for multithreaded code (IE, most of the time stuff just ends up serialized!)...
我想重新编写此代码以使其更健壮并性能更好.基本上有两种方法可以做到这一点:我可以在 2.6 中使用新的 多处理模块+ 或者我可以选择某种反应器/基于事件的模型.我宁愿做后者,因为它更简单,更不容易出错.
I'd like to rework this code to make it more robust and perform better. There are basically two ways I could do this: I could use the new multiprocessing module in 2.6+ or I could go for a reactor / event-based model of some sort. I would rather do the later since it's far simpler and less error-prone.
所以这个问题与什么框架最适合我的需求有关.以下是我目前所知道的选项列表:
So the question relates to what framework would be best suited to my needs. The following is a list of the options I know about so far:
- Twisted:Python 反应堆框架的鼻祖:虽然看起来很复杂而且有点臃肿.小任务的陡峭学习曲线.
- Eventlet:来自 lindenlab.面向此类任务的基于 Greenlet 的框架.不过,我查看了代码,它不太漂亮:不符合 pep8,散落着打印(为什么人们要在框架中这样做!?),API 似乎有点不一致.
- PyEv:不成熟,目前似乎没有人使用它,尽管它基于 libevent,所以它有一个可靠的后端.
- asyncore:来自标准库:über 低级,似乎有很多跑腿工作只是为了让一些东西起步.
- tornado:虽然这是一款面向服务器的产品,设计用于服务器动态网站,但它确实具有 异步 HTTP 客户端 和一个简单的 ioloop.看起来它可以完成工作,但不是它的目的.
- Twisted: The granddaddy of Python reactor frameworks: seems complex and a bit bloated however. Steep learning curve for a small task.
- Eventlet: From the guys at lindenlab. Greenlet based framework that's geared towards these kinds of tasks. I had a look at the code though and it's not too pretty: non-pep8 compliant, scattered with prints (why do people do this in a framework!?), API seems a little inconsistent.
- PyEv: Immature, doesn't seem to be anyone using it right now though it is based on libevent so it's got a solid backend.
- asyncore: From the stdlib: über low-level, seems like a lot of legwork involved just to get something off the ground.
- tornado: Though this is a server oriented product designed to server dynamic websites it does feature an async HTTP client and a simple ioloop. Looks like it could get the job done but not what it was intended for.[edit: doesn't run on Windows unfortunately, which counts it out for me - its a requirement for me to support this lame platform]
有什么我错过的吗?肯定有一个库适合简化的异步网络库的最佳位置!
Is there anything I have missed at all? Surely there must be a library out there that fits the sweet-spot of a simplified async networking library!
推荐答案
我喜欢 concurrence Python模块依赖于 Stackless Python 微线程或 Greenlets 来实现轻量级线程.所有阻塞网络 I/O 都通过单个 libevent
循环透明地异步化,因此它应该几乎与真正的异步服务器一样高效.
I liked the concurrence Python module which relies on either Stackless Python microthreads or Greenlets for light-weight threading. All blocking network I/O is transparently made asynchronous through a single libevent
loop, so it should be nearly as efficient as an real asynchronous server.
我想它在这方面类似于 Eventlet.
I suppose it's similar to Eventlet in this way.
缺点是它的API与Python的sockets
/threading
模块有很大的不同;你需要重写你的应用程序的相当一部分(或编写一个兼容性填充层)
The downside is that its API is quite different from Python's sockets
/threading
modules; you need to rewrite a fair bit of your application (or write a compatibility shim layer)
似乎还有 cogen,类似,但使用 Python 2.5 的增强生成器作为其协程,而不是 Greenlets.这使得它比并发和其他替代方案更便携.网络I/O直接用epoll/kqueue/iocp完成.
It seems that there's also cogen, which is similar, but uses Python 2.5's enhanced generators for its coroutines, instead of Greenlets. This makes it more portable than concurrence and other alternatives. Network I/O is done directly with epoll/kqueue/iocp.
这篇关于一个干净、轻量级的 Python 扭曲替代品?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!