问题描述
我编写了一个 python 脚本来验证来自主机的 url 连接.在 linux curl
中报告成功 (http 200) 的内容在 python (3.6) requests
模块中报告为 403.
我希望有人能帮助我理解这里报告的 http 状态代码的差异?
从 Linux 命令行卷曲....
$ curl -ILs https://www.h2o.ai|egrep ^HTTPHTTP/1.1 200 正常
Python 请求模块.....
>>>进口请求>>>url = 'https://www.h2o.ai'>>>r = requests.get(url, verify=True, timeout=3)>>>r.status_code403>>>requests.packages.urllib3.disable_warnings()>>>r = requests.get(url, verify=False, timeout=3)>>>r.status_code403看来 python-requests/
User-Agent
正在为 403 提供服务来自网站的回应:
In [98]: requests.head('https://www.h2o.ai', headers={'User-Agent': 'Foo bar'})输出[98]:<响应[200]>在 [99]: requests.head('https://www.h2o.ai')输出[99]:<响应[403]>
如果需要,您可以联系网站所有者,或者只是通过 User-Agent
标头使用不同的用户代理(就像我上面使用的那样).
我是如何调试的:
我已经使用 -v
(--verbose
) 选项运行 curl
来检查正在发送的标头,然后检出相同的requests
使用 response.request
(假设响应保存为 response
).
除了 User-Agent
标头之外,我没有发现任何显着差异;因此,更改 User-Agent
标头按我的预期工作.
I have written a python script to validate url connectivity from a host. What is reporting successful (http 200) in linux curl
is reported as a 403 in the python (3.6) requests
module.
I'm hoping someone can help me understand the differences here in reported http status codes?
Curl from the Linux command line....
$ curl -ILs https://www.h2o.ai|egrep ^HTTP
HTTP/1.1 200 OK
Python requests module.....
>>> import requests
>>> url = 'https://www.h2o.ai'
>>> r = requests.get(url, verify=True, timeout=3)
>>> r.status_code
403
>>> requests.packages.urllib3.disable_warnings()
>>> r = requests.get(url, verify=False, timeout=3)
>>> r.status_code
403
It seems the python-requests/<version>
User-Agent
is being served the 403 response from the site:
In [98]: requests.head('https://www.h2o.ai', headers={'User-Agent': 'Foo bar'})
Out[98]: <Response [200]>
In [99]: requests.head('https://www.h2o.ai')
Out[99]: <Response [403]>
You can contact the site owner if you want or just use a different user-agent via the User-Agent
header (like i used above).
How did i debug this:
I have run curl
with -v
(--verbose
) option to check the headers being sent, and then checked out the same with requests
using response.request
(assuming the response is saved as response
).
I did not find any significant difference apart from the User-Agent
header; hence, changing the User-Agent
header worked as i expected.
这篇关于Curl 和 Python Requests (get) 报告不同的 http 状态代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!