示例urllib3和python中的线程

本文介绍了示例urllib3和python中的线程的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试在简单线程中使用urllib3来获取几个Wiki页面.该脚本将

I am trying to use urllib3 in simple thread to fetch several wiki pages. The script will

为每个线程创建1个连接(我不明白为什么)并永久挂起.urllib3和线程的任何技巧，建议或简单示例

Create 1 connection for every thread (I don't understand why) and Hang forever.Any tip, advice or simple example of urllib3 and threading

import threadpool
from urllib3 import connection_from_url

HTTP_POOL = connection_from_url(url, timeout=10.0, maxsize=10, block=True)

def fetch(url, fiedls):
  kwargs={'retries':6}
  return HTTP_POOL.get_url(url, fields, **kwargs)

pool = threadpool.ThreadPool(5)
requests = threadpool.makeRequests(fetch, iterable)
[pool.putRequest(req) for req in requests]

@Lennart的脚本出现此错误:

@Lennart's script got this error:

http://en.wikipedia.org/wiki/2010-11_Premier_LeagueTraceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/threadpool.py", line 156, in run
 http://en.wikipedia.org/wiki/List_of_MythBusters_episodeshttp://en.wikipedia.org/wiki/List_of_Top_Gear_episodes http://en.wikipedia.org/wiki/List_of_Unicode_characters    result = request.callable(*request.args, **request.kwds)
  File "crawler.py", line 9, in fetch
    print url, conn.get_url(url)
AttributeError: 'HTTPConnectionPool' object has no attribute 'get_url'
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/threadpool.py", line 156, in run
    result = request.callable(*request.args, **request.kwds)
  File "crawler.py", line 9, in fetch
    print url, conn.get_url(url)
AttributeError: 'HTTPConnectionPool' object has no attribute 'get_url'
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/threadpool.py", line 156, in run
    result = request.callable(*request.args, **request.kwds)
  File "crawler.py", line 9, in fetch
    print url, conn.get_url(url)
AttributeError: 'HTTPConnectionPool' object has no attribute 'get_url'
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/threadpool.py", line 156, in run
    result = request.callable(*request.args, **request.kwds)
  File "crawler.py", line 9, in fetch
    print url, conn.get_url(url)
AttributeError: 'HTTPConnectionPool' object has no attribute 'get_url'

添加import threadpool; import urllib3和tpool = threadpool.ThreadPool(4) @ user318904的代码后，出现此错误:

After adding import threadpool; import urllib3 and tpool = threadpool.ThreadPool(4) @user318904's code got this error:

Traceback (most recent call last):
  File "crawler.py", line 21, in <module>
    tpool.map_async(fetch, urls)
AttributeError: ThreadPool instance has no attribute 'map_async'

一些评论

我的代码基于Beazley和Jones的Python Cookbook中的类似示例.
我特别喜欢这样的事实，除了urllib3外，您还需要一个标准模块.
设置非常简单，如果您只想了解download中的副作用(例如打印，保存到文件等)，则无需额外的精力来连接线程.
如果您想要其他东西，ThreadPoolExecutor.submit实际上会返回download会返回的内容，并包装在Future中.
我发现将线程池中的线程数量与连接池中的HTTPConnection数量对齐(通过maxsize)很有帮助.否则，当所有线程尝试访问同一服务器时，您可能会遇到(无害)警告(如示例中所示).

Some remarks

My code is based on a similar example from the Python Cookbook by Beazley and Jones.
I particularly like the fact that you only need a standard module besides urllib3.
The setup is extremely simple, and if you are only going for side-effects in download (like printing, saving to a file, etc.), there is no additional effort in joining the threads.
If you want something different, ThreadPoolExecutor.submit actually returns whatever download would return, wrapped in a Future.
I found it helpful to align the number of threads in the thread pool with the number of HTTPConnection's in a connection pool (via maxsize). Otherwise you might encounter (harmless) warnings when all threads try to access the same server (as in the example).

这篇关于示例urllib3和python中的线程的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

urllib3