Python 请求 - 按服务器 IP 使用导航站点

本文介绍了Python 请求 - 按服务器 IP 使用导航站点的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想抓取一个网站，但是 cloudflare 妨碍了我.我能够获得服务器 IP，所以 cloudflare 不会打扰我.

I want to crawl a site, however cloudflare was getting in the way. I was able to get the servers IP, so cloudflare won't bother me.

如何在请求库中使用它?

How can I utilize this in the requests library?

比如我想直接去www.example.com/foo.php，但在请求中它将解析 cloudflare 网络上的 IP，而不是我希望它使用的 IP.我怎样才能让它使用我想要它使用的那个?

For example, I want to go directly towww.example.com/foo.php, but in requests it will resolve the IP on the cloudflare network instead of the one I want it to use. How can I make it use the one I want it to use?

我会发送一个请求，所以主机的真实 IP 设置为 www.example.com，但这只会给我主页.我如何访问网站上的其他链接?

I would of sent in a request so the real IP with the host set as the www.example.com, but that will just give me the home page. How can I visit other links on the site?

推荐答案

您必须设置一个自定义标题 host，其值为 example.com，例如:

You will have to set a custom header host with value of example.com, something like:

requests.get('http://127.0.0.1/foo.php', headers={'host': 'example.com'})

应该可以解决问题.如果要验证，请输入以下命令(需要 netcat):nc -l -p 80 然后运行上述命令.它将在 netcat 窗口中产生输出:

should do the trick. If you want to verify that then type in the following command (requires netcat): nc -l -p 80 and then run the above command. It will produce output in the netcat window:

GET /foo.php HTTP/1.1
Host: example.com
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.6.2 CPython/3.4.3 Windows/8

这篇关于Python 请求 - 按服务器 IP 使用导航站点的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！