问题描述
我正在处理一个基本上如下的过程:
I'm working with a process which is basically as follows:
- 获取一些网址列表.
- 获取
Response
每个对象. - 从每个响应的
text
创建一个BeautifulSoup对象. - 从BeautifulSoup对象中提取特定标签的文本.
- Take some list of urls.
- Get a
Response
object from each. - Create a BeautifulSoup object from the
text
of each Response. - Pull the text of a certain tag from that BeautifulSoup object.
根据我的理解,这似乎很适合 grequests :
From my understanding, this seems ideal for grequests:
但是,这两个过程(一个带有请求,一个带有grequests)似乎为我带来了不同的结果,grequests中的某些请求返回了None
而不是响应.
But yet, the two processes (one with requests, one with grequests) seem to be getting me different results, with some of the requests in grequests returning None
rather than a response.
import requests
tickers = [
'A', 'AAL', 'AAP', 'AAPL', 'ABBV', 'ABC', 'ABT', 'ACN', 'ADBE', 'ADI',
'ADM', 'ADP', 'ADS', 'ADSK', 'AEE', 'AEP', 'AES', 'AET', 'AFL', 'AGN',
'AIG', 'AIV', 'AIZ', 'AJG', 'AKAM', 'ALB', 'ALGN', 'ALK', 'ALL', 'ALLE',
]
BASE = 'https://finance.google.com/finance?q={}'
rs = (requests.get(u) for u in [BASE.format(t) for t in tickers])
rs = list(rs)
rs
# [<Response [200]>,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>,
# ...
# <Response [200]>]
# All are okay (status_code == 200)
使用grequests
# Restarted my interpreter and redefined `tickers` and `BASE`
import grequests
rs = (grequests.get(u) for u in [BASE.format(t) for t in tickers])
rs = grequests.map(rs)
rs
# [None,
# <Response [200]>,
# None,
# None,
# None,
# None,
# None,
# None,
# None,
# None,
# None,
# None,
# None,
# None,
# None,
# None,
# None,
# None,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>,
# <Response [200]>]
为什么结果不同?
更新:我可以如下打印异常类型.相关讨论此处,但我不知道发生了什么.
Update: I can print the exception type as follows. Related discussion here but I have no idea what's going on.
def exception_handler(request, exception):
print(exception)
rs = grequests.map(rs, exception_handler=exception_handler)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
系统/版本信息
- 要求:2.18.4
- grequests:0.3.0
- Python:3.6.3
- urllib3:1.22
- pyopenssl:17.2.0
- 全部通过Anaconda
- 系统:Mac OSX HS和amp; amp;上均存在相同的问题Windows 10,内部版本10.0.16299
- requests: 2.18.4
- grequests: 0.3.0
- Python: 3.6.3
- urllib3: 1.22
- pyopenssl: 17.2.0
- All via Anaconda
- System: same issue on both Mac OSX HS & Windows 10, build 10.0.16299
System/version info
推荐答案
您发送请求的速度太快.由于grequests
是一个异步库,因此所有这些请求几乎都同时发送.他们太多了.
You are just sending requests too fast. As grequests
is an async lib, all of these requests are almost sent simultaneously. They are too many.
您只需要通过grequests.map(rs, size=your_choice)
限制并发任务,我已经测试过grequests.map(rs, size=10)
,并且效果很好.
You just need to limit the concurrent tasks by grequests.map(rs, size=your_choice)
, I have tested grequests.map(rs, size=10)
and it works well.
这篇关于了解请求与grequest的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!