本文介绍了Python中的Scraper提供了“访问被拒绝"提示,的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试使用Python编写一个scraper,以从页面中获取一些信息.就像此页面上显示的优惠标题一样:
https://www.justdial.com/Panipat/Saree-Retailers/nct-10420585
I'm trying to code a scraper in Python to get some info from a page. Like the title of the offers that appear on this page:
https://www.justdial.com/Panipat/Saree-Retailers/nct-10420585
现在我使用以下代码:
import bs4
import requests
def extract_source(url):
source=requests.get(url).text
return source
def extract_data(source):
soup=bs4.BeautifulSoup(source)
names=soup.findAll('title')
for i in names:
print i
extract_data(extract_source('https://www.justdial.com/Panipat/Saree-Retailers/nct-10420585'))
但是当我执行这段代码时,它给了我一个错误:
But when I execute this code, it gives me an error:
<titlee> Access Denied</titlee>
该怎么办?
推荐答案
如注释中所述,您需要指定允许的用户代理并将其作为headers
传递:
As was mentioned in comments, you need to specify allowable user-agent and pass it as headers
:
def extract_source(url):
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0'}
source=requests.get(url, headers=headers).text
return source
这篇关于Python中的Scraper提供了“访问被拒绝"提示,的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!