问题描述
我想抓取一个网站,但是 cloudflare 妨碍了我.我能够获得服务器 IP,所以 cloudflare 不会打扰我.
I want to crawl a site, however cloudflare was getting in the way. I was able to get the servers IP, so cloudflare won't bother me.
如何在请求库中使用它?
How can I utilize this in the requests library?
比如我想直接去www.example.com/foo.php
,但在请求中它将解析 cloudflare 网络上的 IP,而不是我希望它使用的 IP.我怎样才能让它使用我想要它使用的那个?
For example, I want to go directly towww.example.com/foo.php
, but in requests it will resolve the IP on the cloudflare network instead of the one I want it to use. How can I make it use the one I want it to use?
我会发送一个请求,所以主机的真实 IP 设置为 www.example.com,但这只会给我主页.我如何访问网站上的其他链接?
I would of sent in a request so the real IP with the host set as the www.example.com, but that will just give me the home page. How can I visit other links on the site?
推荐答案
您必须设置一个自定义标题 host
,其值为 example.com
,例如:
You will have to set a custom header host
with value of example.com
, something like:
requests.get('http://127.0.0.1/foo.php', headers={'host': 'example.com'})
应该可以解决问题.如果要验证,请输入以下命令(需要 netcat):nc -l -p 80
然后运行上述命令.它将在 netcat 窗口中产生输出:
should do the trick. If you want to verify that then type in the following command (requires netcat): nc -l -p 80
and then run the above command. It will produce output in the netcat window:
GET /foo.php HTTP/1.1
Host: example.com
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.6.2 CPython/3.4.3 Windows/8
这篇关于Python 请求 - 按服务器 IP 使用导航站点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!