问题描述
我正在尝试使用请求和BeautifulSoup抓取网站.当我运行代码以获取网页的标签时,汤对象为空.我打印了请求对象,以查看请求是否成功,但是否成功.打印的结果显示响应447.我找不到447作为HTTP状态代码的含义.有谁知道我如何成功连接并刮取网站?
I'm trying to scrape a website using requests and BeautifulSoup. When i run the code to obtain the tags of the webbpage the soup object is blank. I printed out the request object to see whether the request was successful, and it was not. The printed result shows response 447. I cant find what 447 means as a HTTP Status Code. Does anyone know how I can successfully connect and scrape the site?
代码:
r = requests.get('https://foobar)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.get_text())
Output:
''
当我打印请求对象时:
print(r)
Output:
<Response [447]>
推荐答案
该网站很可能承认您的活动,因此它阻止了您的访问,您可以通过在网站请求中包含标头来解决此问题.
Most likely your activity is acknowledged by the site so it's blocking your access,you can fix this problem by including headers in your request to site.
import bs4
import requests
session=requests.session()
headers={"User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"}
req=session.get(url,headers=headers)
soup=bs4.BeautifulSoup(req.text)
这篇关于请求返回响应447的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!