本文介绍了如何使用 Python 请求来伪造浏览器访问 a.k.a 并生成用户代理?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从这个网站获取内容.

如果我使用像 Firefox 或 Chrome 这样的浏览器,我可以获得我想要的真实网站页面,但是如果我使用 Python requests 包(或 wget 命令)来获取它,它会返回一个完全不同的 HTML 页面.

我以为网站的开发者为此设置了一些障碍.

问题

如何使用 python 请求或命令 wget 伪造浏览器访问?

解决方案

提供 User-Agent 标头:

导入请求url = 'http://www.ichangtou.com/#company:data_000008.html'headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}response = requests.get(url, headers=headers)打印(响应.内容)

仅供参考,这里是不同浏览器的用户代理字符串列表:

作为旁注,有一个非常有用的第三方包 fake-useragent 提供了一个很好的用户代理抽象层:

假用户代理

具有真实世界数据库的最新简单用户代理伪造器

演示:

>>>从 fake_useragent 导入 UserAgent>>>ua = 用户代理()>>>ua.chromeu'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36'>>>ua.randomu'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36'

I want to get the content from this website.

If I use a browser like Firefox or Chrome I could get the real website page I want, but if I use the Python requests package (or wget command) to get it, it returns a totally different HTML page.

I thought the developer of the website had made some blocks for this.

Question

How do I fake a browser visit by using python requests or command wget?

解决方案

Provide a User-Agent header:

import requests

url = 'http://www.ichangtou.com/#company:data_000008.html'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

response = requests.get(url, headers=headers)
print(response.content)

FYI, here is a list of User-Agent strings for different browsers:


As a side note, there is a pretty useful third-party package called fake-useragent that provides a nice abstraction layer over user agents:

Demo:

>>> from fake_useragent import UserAgent
>>> ua = UserAgent()
>>> ua.chrome
u'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36'
>>> ua.random
u'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36'

这篇关于如何使用 Python 请求来伪造浏览器访问 a.k.a 并生成用户代理?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-16 23:21