使用aiohttp获取多个URL

使用aiohttp获取多个URL

本文介绍了asyncio Web抓取101:使用aiohttp获取多个URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在先前的问题中, aiohttp 的一位作者恳切推荐了网址:

In earlier question, one of authors of aiohttp kindly suggested way to fetch multiple urls with aiohttp using the new async with syntax from Python 3.5:

import aiohttp
import asyncio

async def fetch(session, url):
    with aiohttp.Timeout(10):
        async with session.get(url) as response:
            return await response.text()

async def fetch_all(session, urls, loop):
    results = await asyncio.wait([loop.create_task(fetch(session, url))
                                  for url in urls])
    return results

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    # breaks because of the first url
    urls = ['http://SDFKHSKHGKLHSKLJHGSDFKSJH.com',
            'http://google.com',
            'http://twitter.com']
    with aiohttp.ClientSession(loop=loop) as session:
        the_results = loop.run_until_complete(
            fetch_all(session, urls, loop))
        # do something with the the_results

但是,当 session.get(url)请求之一中断时(如上,由于 http://SDFKHSKHGKLHSKLJHGSDFKSJH.com ),该错误未得到处理,整个过程都坏了。

However when one of the session.get(url) requests breaks (as above because of http://SDFKHSKHGKLHSKLJHGSDFKSJH.com) the error is not handled and the whole thing breaks.

我在寻找插入测试的方法关于 session.get(url)的结果,例如寻找 try的地方...除了... ,或者对于 if response.status!= 200:,但我只是不了解如何使用等待和各种对象。

I looked for ways to insert tests about the result of session.get(url), for instance looking for places for a try ... except ..., or for a if response.status != 200: but I am just not understanding how to work with async with, await and the various objects.

由于还是很新,没有很多例子。如果 asyncio 向导可以显示如何执行此操作,则对许多人来说将非常有帮助。毕竟,大多数人想要使用 asyncio 进行测试的第一件事就是同时获取多个资源。

Since async with is still very new there are not many examples. It would be very helpful to many people if an asyncio wizard could show how to do this. After all one of the first things most people will want to test with asyncio is getting multiple resources concurrently.

目标

目标是我们可以检查 the_results 并快速查看其中一个:

The goal is that we can inspect the_results and quickly see either:


  • 此网址失败了(原因:状态代码,也许是异常名称),或者

  • 该网址有效,这是一个有用的响应对象

推荐答案

我会使用而不是等待,它可以将异常作为对象返回,而无需引发它们。然后,您可以检查每个结果(如果它是某些异常的实例)。

I would use gather instead of wait, which can return exceptions as objects, without raising them. Then you can check each result, if it is instance of some exception.

import aiohttp
import asyncio

async def fetch(session, url):
    with aiohttp.Timeout(10):
        async with session.get(url) as response:
            return await response.text()

async def fetch_all(session, urls, loop):
    results = await asyncio.gather(
        *[fetch(session, url) for url in urls],
        return_exceptions=True  # default is false, that would raise
    )

    # for testing purposes only
    # gather returns results in the order of coros
    for idx, url in enumerate(urls):
        print('{}: {}'.format(url, 'ERR' if isinstance(results[idx], Exception) else 'OK'))
    return results

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    # breaks because of the first url
    urls = [
        'http://SDFKHSKHGKLHSKLJHGSDFKSJH.com',
        'http://google.com',
        'http://twitter.com']
    with aiohttp.ClientSession(loop=loop) as session:
        the_results = loop.run_until_complete(
            fetch_all(session, urls, loop))

测试:

$python test.py
http://SDFKHSKHGKLHSKLJHGSDFKSJH.com: ERR
http://google.com: OK
http://twitter.com: OK

这篇关于asyncio Web抓取101:使用aiohttp获取多个URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-22 21:12