问题描述
我正在编写一些 python 客户端代码,由于一些环境限制,我想指定一个 URL 并控制它的解析方式.我可以通过使用 --resolve 标志通过 curl 完成此操作.有没有办法用 Python 的 requests 库做类似的事情?
I am writing some python client code and, due to some environmental constraints, I want to specify a URL and also control how it is resolved. I can accomplish this with curl by using the --resolve flag. Is there a way to do something similar with Python's requests library?
理想情况下,这适用于 Python 2.7,但我也可以使 3.x 解决方案工作.
Ideally this would work in Python 2.7 but I can make a 3.x solution work as well.
推荐答案
经过一番挖掘,我(不出所料)发现 Requests 通过要求 Python 来解析主机名(这是要求您的操作系统来做).首先我找到了一些示例代码来劫持 DNS 解析(告诉 urllib2 使用自定义 DNS)然后我想出了更多套接字文档中有关 Python 如何解析主机名的详细信息.然后就是将所有东西连接在一起的问题:
After doing a bit of digging, I (unsurprisingly) found that Requests resolves hostnames by asking Python to do it (which is asking your operating system to do it). First I found some sample code to hijack DNS resolution (Tell urllib2 to use custom DNS) and then I figured out a few more details about how Python resolves hostnames in the socket documentation. Then it was just a matter of wiring everything together:
import socket
import requests
def is_ipv4(s):
# Feel free to improve this: https://stackoverflow.com/questions/11827961/checking-for-ip-addresses
return ':' not in s
dns_cache = {}
def add_custom_dns(domain, port, ip):
key = (domain, port)
# Strange parameters explained at:
# https://docs.python.org/2/library/socket.html#socket.getaddrinfo
# Values were taken from the output of `socket.getaddrinfo(...)`
if is_ipv4(ip):
value = (socket.AddressFamily.AF_INET, 0, 0, '', (ip, port))
else: # ipv6
value = (socket.AddressFamily.AF_INET6, 0, 0, '', (ip, port, 0, 0))
dns_cache[key] = [value]
# Inspired by: https://stackoverflow.com/a/15065711/868533
prv_getaddrinfo = socket.getaddrinfo
def new_getaddrinfo(*args):
# Uncomment to see what calls to `getaddrinfo` look like.
# print(args)
try:
return dns_cache[args[:2]] # hostname and port
except KeyError:
return prv_getaddrinfo(*args)
socket.getaddrinfo = new_getaddrinfo
# Redirect example.com to the IP of test.domain.com (completely unrelated).
add_custom_dns('example.com', 80, '66.96.162.92')
res = requests.get('http://example.com')
print(res.text) # Prints out the HTML of test.domain.com.
我在写这篇文章时遇到的一些警告:
Some caveats I ran into while writing this:
- 这对于
https
效果不佳.代码工作正常(只需使用https://
和443
而不是http://
和80
).但是,SSL 证书与域名相关联,Requests 将尝试将证书上的名称验证为您尝试连接到的原始域. getaddrinfo
返回的 IPv4 和 IPv6 地址信息略有不同.我对is_ipv4
的实现对我来说感觉很糟糕,如果您在实际应用中使用它,我强烈建议您使用更好的版本.- 该代码已经在 Python 3 上进行了测试,但我看不出它为什么不能在 Python 2 上按原样运行.
- This works poorly for
https
. The code works fine (just usehttps://
and443
instead ofhttp://
and80
). However, SSL certificates are tied to domain names and Requests is going to try validating the name on the certificate to the original domain you tried connecting to. getaddrinfo
returns slightly different info for IPv4 and IPv6 addresses. My implementation foris_ipv4
feels hacky to me and I strongly recommend a better version if you're using this in a real application.- The code has been tested on Python 3 but I see no reason why it wouldn't work as-is on Python 2.
这篇关于如何以与 curl 的 --resolve 标志类似的方式在 python 的请求库中指定 URL 解析?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!