本文介绍了了解请求与grequest的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一个基本上如下的过程:

I'm working with a process which is basically as follows:

  1. 获取一些网址列表.
  2. 获取 Response 每个对象.
  3. 从每个响应的text创建一个BeautifulSoup对象.
  4. 从BeautifulSoup对象中提取特定标签的文本.
  1. Take some list of urls.
  2. Get a Response object from each.
  3. Create a BeautifulSoup object from the text of each Response.
  4. Pull the text of a certain tag from that BeautifulSoup object.

根据我的理解,这似乎很适合 grequests :

From my understanding, this seems ideal for grequests:

但是,这两个过程(一个带有请求,一个带有grequests)似乎为我带来了不同的结果,grequests中的某些请求返回了None而不是响应.

But yet, the two processes (one with requests, one with grequests) seem to be getting me different results, with some of the requests in grequests returning None rather than a response.

import requests

tickers = [
    'A', 'AAL', 'AAP', 'AAPL', 'ABBV', 'ABC', 'ABT', 'ACN', 'ADBE', 'ADI', 
    'ADM',  'ADP', 'ADS', 'ADSK', 'AEE', 'AEP', 'AES', 'AET', 'AFL', 'AGN', 
    'AIG', 'AIV', 'AIZ', 'AJG', 'AKAM', 'ALB', 'ALGN', 'ALK', 'ALL', 'ALLE',
    ]

BASE = 'https://finance.google.com/finance?q={}'

rs = (requests.get(u) for u in [BASE.format(t) for t in tickers])
rs = list(rs)

rs
# [<Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # ...
 # <Response [200]>]

# All are okay (status_code == 200)

使用grequests

# Restarted my interpreter and redefined `tickers` and `BASE`
import grequests

rs = (grequests.get(u) for u in [BASE.format(t) for t in tickers])
rs = grequests.map(rs)

rs
# [None,
 # <Response [200]>,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>]

为什么结果不同?

更新:我可以如下打印异常类型.相关讨论此处,但我不知道发生了什么.

Update: I can print the exception type as follows. Related discussion here but I have no idea what's going on.

def exception_handler(request, exception):
    print(exception)

rs = grequests.map(rs, exception_handler=exception_handler)

# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)

系统/版本信息

  • 要求:2.18.4
  • grequests:0.3.0
  • Python:3.6.3
  • urllib3:1.22
  • pyopenssl:17.2.0
  • 全部通过Anaconda
  • 系统:Mac OSX HS和amp; amp;上均存在相同的问题Windows 10,内部版本10.0.16299
  • System/version info

    • requests: 2.18.4
    • grequests: 0.3.0
    • Python: 3.6.3
    • urllib3: 1.22
    • pyopenssl: 17.2.0
    • All via Anaconda
    • System: same issue on both Mac OSX HS & Windows 10, build 10.0.16299
    • 推荐答案

      您发送请求的速度太快.由于grequests是一个异步库,因此所有这些请求几乎都同时发送.他们太多了.

      You are just sending requests too fast. As grequests is an async lib, all of these requests are almost sent simultaneously. They are too many.

      您只需要通过grequests.map(rs, size=your_choice)限制并发任务,我已经测试过grequests.map(rs, size=10),并且效果很好.

      You just need to limit the concurrent tasks by grequests.map(rs, size=your_choice), I have tested grequests.map(rs, size=10) and it works well.

      这篇关于了解请求与grequest的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-16 05:53