本文介绍了如何以与 curl 的 --resolve 标志类似的方式在 python 的请求库中指定 URL 解析?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一些 python 客户端代码,由于一些环境限制,我想指定一个 URL 并控制它的解析方式.我可以通过使用 --resolve 标志通过 curl 完成此操作.有没有办法用 Python 的 requests 库做类似的事情?

I am writing some python client code and, due to some environmental constraints, I want to specify a URL and also control how it is resolved. I can accomplish this with curl by using the --resolve flag. Is there a way to do something similar with Python's requests library?

理想情况下,这适用于 Python 2.7,但我也可以使 3.x 解决方案工作.

Ideally this would work in Python 2.7 but I can make a 3.x solution work as well.

推荐答案

经过一番挖掘,我(不出所料)发现 Requests 通过要求 Python 来解析主机名(这是要求您的操作系统来做).首先我找到了一些示例代码来劫持 DNS 解析(告诉 urllib2 使用自定义 DNS)然后我想出了更多套接字文档中有关 Python 如何解析主机名的详细信息.然后就是将所有东西连接在一起的问题:

After doing a bit of digging, I (unsurprisingly) found that Requests resolves hostnames by asking Python to do it (which is asking your operating system to do it). First I found some sample code to hijack DNS resolution (Tell urllib2 to use custom DNS) and then I figured out a few more details about how Python resolves hostnames in the socket documentation. Then it was just a matter of wiring everything together:

import socket
import requests

def is_ipv4(s):
    # Feel free to improve this: https://stackoverflow.com/questions/11827961/checking-for-ip-addresses
    return ':' not in s

dns_cache = {}

def add_custom_dns(domain, port, ip):
    key = (domain, port)
    # Strange parameters explained at:
    # https://docs.python.org/2/library/socket.html#socket.getaddrinfo
    # Values were taken from the output of `socket.getaddrinfo(...)`
    if is_ipv4(ip):
        value = (socket.AddressFamily.AF_INET, 0, 0, '', (ip, port))
    else: # ipv6
        value = (socket.AddressFamily.AF_INET6, 0, 0, '', (ip, port, 0, 0))
    dns_cache[key] = [value]

# Inspired by: https://stackoverflow.com/a/15065711/868533
prv_getaddrinfo = socket.getaddrinfo
def new_getaddrinfo(*args):
    # Uncomment to see what calls to `getaddrinfo` look like.
    # print(args)
    try:
        return dns_cache[args[:2]] # hostname and port
    except KeyError:
        return prv_getaddrinfo(*args)

socket.getaddrinfo = new_getaddrinfo

# Redirect example.com to the IP of test.domain.com (completely unrelated).
add_custom_dns('example.com', 80, '66.96.162.92')
res = requests.get('http://example.com')
print(res.text) # Prints out the HTML of test.domain.com.

我在写这篇文章时遇到的一些警告:

Some caveats I ran into while writing this:

  • 这对于 https 效果不佳.代码工作正常(只需使用 https://443 而不是 http://80).但是,SSL 证书与域名相关联,Requests 将尝试将证书上的名称验证为您尝试连接到的原始域.
  • getaddrinfo 返回的 IPv4 和 IPv6 地址信息略有不同.我对 is_ipv4 的实现对我来说感觉很糟糕,如果您在实际应用中使用它,我强烈建议您使用更好的版本.
  • 该代码已经在 Python 3 上进行了测试,但我看不出它为什么不能在 Python 2 上按原样运行.
  • This works poorly for https. The code works fine (just use https:// and 443 instead of http:// and 80). However, SSL certificates are tied to domain names and Requests is going to try validating the name on the certificate to the original domain you tried connecting to.
  • getaddrinfo returns slightly different info for IPv4 and IPv6 addresses. My implementation for is_ipv4 feels hacky to me and I strongly recommend a better version if you're using this in a real application.
  • The code has been tested on Python 3 but I see no reason why it wouldn't work as-is on Python 2.

这篇关于如何以与 curl 的 --resolve 标志类似的方式在 python 的请求库中指定 URL 解析?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-06 03:22