问题描述
我想从这个网站获取内容.
如果我使用像 Firefox 或 Chrome 这样的浏览器,我可以获得我想要的真实网站页面,但是如果我使用 Python requests 包(或 wget
命令)来获取它,它会返回一个完全不同的 HTML 页面.
我以为网站的开发者为此设置了一些障碍.
问题
如何使用 python 请求或命令 wget 伪造浏览器访问?
提供 User-Agent
标头:
导入请求url = 'http://www.ichangtou.com/#company:data_000008.html'headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}response = requests.get(url, headers=headers)打印(响应.内容)
仅供参考,这里是不同浏览器的用户代理字符串列表:
作为旁注,有一个非常有用的第三方包 fake-useragent 提供了一个很好的用户代理抽象层:
假用户代理
具有真实世界数据库的最新简单用户代理伪造器
演示:
>>>从 fake_useragent 导入 UserAgent>>>ua = 用户代理()>>>ua.chromeu'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36'>>>ua.randomu'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36'I want to get the content from this website.
If I use a browser like Firefox or Chrome I could get the real website page I want, but if I use the Python requests package (or wget
command) to get it, it returns a totally different HTML page.
I thought the developer of the website had made some blocks for this.
Question
How do I fake a browser visit by using python requests or command wget?
Provide a User-Agent
header:
import requests
url = 'http://www.ichangtou.com/#company:data_000008.html'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
response = requests.get(url, headers=headers)
print(response.content)
FYI, here is a list of User-Agent strings for different browsers:
As a side note, there is a pretty useful third-party package called fake-useragent that provides a nice abstraction layer over user agents:
Demo:
>>> from fake_useragent import UserAgent
>>> ua = UserAgent()
>>> ua.chrome
u'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36'
>>> ua.random
u'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36'
这篇关于如何使用 Python 请求来伪造浏览器访问 a.k.a 并生成用户代理?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!