Curl 和 Python Requests (get) 报告不同的 http 状态代码

本文介绍了Curl 和 Python Requests (get) 报告不同的 http 状态代码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我编写了一个 python 脚本来验证来自主机的 url 连接.在 linux curl 中报告成功 (http 200) 的内容在 python (3.6) requests 模块中报告为 403.

我希望有人能帮助我理解这里报告的 http 状态代码的差异?

从 Linux 命令行卷曲....

$ curl -ILs https://www.h2o.ai|egrep ^HTTPHTTP/1.1 200 正常

Python 请求模块.....

>>>进口请求>>>url = 'https://www.h2o.ai'>>>r = requests.get(url, verify=True, timeout=3)>>>r.status_code403>>>requests.packages.urllib3.disable_warnings()>>>r = requests.get(url, verify=False, timeout=3)>>>r.status_code403

解决方案

看来 python-requests/ User-Agent 正在为 403 提供服务来自网站的回应:

In [98]: requests.head('https://www.h2o.ai', headers={'User-Agent': 'Foo bar'})输出[98]:<响应[200]>在 [99]: requests.head('https://www.h2o.ai')输出[99]:<响应[403]>

如果需要，您可以联系网站所有者，或者只是通过 User-Agent 标头使用不同的用户代理(就像我上面使用的那样).

我是如何调试的:

我已经使用 -v (--verbose) 选项运行 curl 来检查正在发送的标头，然后检出相同的requests 使用 response.request(假设响应保存为 response).

除了 User-Agent 标头之外，我没有发现任何显着差异；因此，更改 User-Agent 标头按我的预期工作.

I have written a python script to validate url connectivity from a host. What is reporting successful (http 200) in linux curl is reported as a 403 in the python (3.6) requests module.

I'm hoping someone can help me understand the differences here in reported http status codes?

Curl from the Linux command line....

$ curl -ILs https://www.h2o.ai|egrep ^HTTP
HTTP/1.1 200 OK

Python requests module.....

>>> import requests
>>> url = 'https://www.h2o.ai'
>>> r = requests.get(url, verify=True, timeout=3)
>>> r.status_code
403
>>> requests.packages.urllib3.disable_warnings()
>>> r = requests.get(url, verify=False, timeout=3)
>>> r.status_code
403

解决方案

It seems the python-requests/<version> User-Agentis being served the 403 response from the site:

In [98]: requests.head('https://www.h2o.ai', headers={'User-Agent': 'Foo bar'})
Out[98]: <Response [200]>

In [99]: requests.head('https://www.h2o.ai')
Out[99]: <Response [403]>

You can contact the site owner if you want or just use a different user-agent via the User-Agent header (like i used above).

How did i debug this:

I have run curl with -v (--verbose) option to check the headers being sent, and then checked out the same with requests using response.request (assuming the response is saved as response).

I did not find any significant difference apart from the User-Agent header; hence, changing the User-Agent header worked as i expected.

这篇关于Curl 和 Python Requests (get) 报告不同的 http 状态代码的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！