问题描述
在先前的问题中, aiohttp
的一位作者恳切推荐了网址:
In earlier question, one of authors of aiohttp
kindly suggested way to fetch multiple urls with aiohttp using the new async with
syntax from Python 3.5
:
import aiohttp
import asyncio
async def fetch(session, url):
with aiohttp.Timeout(10):
async with session.get(url) as response:
return await response.text()
async def fetch_all(session, urls, loop):
results = await asyncio.wait([loop.create_task(fetch(session, url))
for url in urls])
return results
if __name__ == '__main__':
loop = asyncio.get_event_loop()
# breaks because of the first url
urls = ['http://SDFKHSKHGKLHSKLJHGSDFKSJH.com',
'http://google.com',
'http://twitter.com']
with aiohttp.ClientSession(loop=loop) as session:
the_results = loop.run_until_complete(
fetch_all(session, urls, loop))
# do something with the the_results
但是,当 session.get(url)
请求之一中断时(如上,由于 http://SDFKHSKHGKLHSKLJHGSDFKSJH.com
),该错误未得到处理,整个过程都坏了。
However when one of the session.get(url)
requests breaks (as above because of http://SDFKHSKHGKLHSKLJHGSDFKSJH.com
) the error is not handled and the whole thing breaks.
我在寻找插入测试的方法关于 session.get(url)
的结果,例如寻找 try的地方...除了...
,或者对于 if response.status!= 200:
,但我只是不了解如何使用与$ c $异步c>,
等待
和各种对象。
I looked for ways to insert tests about the result of session.get(url)
, for instance looking for places for a try ... except ...
, or for a if response.status != 200:
but I am just not understanding how to work with async with
, await
and the various objects.
由于与$ c异步$ c>还是很新,没有很多例子。如果
asyncio
向导可以显示如何执行此操作,则对许多人来说将非常有帮助。毕竟,大多数人想要使用 asyncio
进行测试的第一件事就是同时获取多个资源。
Since async with
is still very new there are not many examples. It would be very helpful to many people if an asyncio
wizard could show how to do this. After all one of the first things most people will want to test with asyncio
is getting multiple resources concurrently.
目标
目标是我们可以检查 the_results
并快速查看其中一个:
The goal is that we can inspect the_results
and quickly see either:
- 此网址失败了(原因:状态代码,也许是异常名称),或者
- 该网址有效,这是一个有用的响应对象
推荐答案
我会使用而不是等待
,它可以将异常作为对象返回,而无需引发它们。然后,您可以检查每个结果(如果它是某些异常的实例)。
I would use gather
instead of wait
, which can return exceptions as objects, without raising them. Then you can check each result, if it is instance of some exception.
import aiohttp
import asyncio
async def fetch(session, url):
with aiohttp.Timeout(10):
async with session.get(url) as response:
return await response.text()
async def fetch_all(session, urls, loop):
results = await asyncio.gather(
*[fetch(session, url) for url in urls],
return_exceptions=True # default is false, that would raise
)
# for testing purposes only
# gather returns results in the order of coros
for idx, url in enumerate(urls):
print('{}: {}'.format(url, 'ERR' if isinstance(results[idx], Exception) else 'OK'))
return results
if __name__ == '__main__':
loop = asyncio.get_event_loop()
# breaks because of the first url
urls = [
'http://SDFKHSKHGKLHSKLJHGSDFKSJH.com',
'http://google.com',
'http://twitter.com']
with aiohttp.ClientSession(loop=loop) as session:
the_results = loop.run_until_complete(
fetch_all(session, urls, loop))
测试:
$python test.py
http://SDFKHSKHGKLHSKLJHGSDFKSJH.com: ERR
http://google.com: OK
http://twitter.com: OK
这篇关于asyncio Web抓取101:使用aiohttp获取多个URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!