问题描述
我使用BeautifulSoup和请求刮Allrecipes的用户数据。
I'm using BeautifulSoup and Requests to scrape allrecipes user data.
在检查HTML code我发现我想要的数据是包含在
When inspecting the HTML code I find that the data I want is contained within
<article class="profile-review-card">
然而,当我用下面的code
However when I use the following code
URL = 'http://allrecipes.com/cook/2010/reviews/'
response = requests.get(URL ).content
soup = BeautifulSoup(response, 'html.parser')
X = soup.find_all('article', class_ = "profile-review-card" )
虽然汤和响应都充满的HTML,X是空的。我已经通过看,有什么之间我检查元素和requests.get(URL).content看到一些不一致的地方,这是怎么回事?
While soup and response are full of html, X is empty. I've looked through and there are some inconsistencies between what I see with inspect element and requests.get(URL).content, what is going on?
推荐答案
这是因为它使用Ajax / JavaScript的加载。图书馆的请求不处理,你需要使用的东西,可以执行这些脚本,并得到了DOM。有多种方案,我将列出一对夫妇,让你开始。
That's because it's loaded using Ajax/javascript. Requests library doesn't handle that, you'll need to use something that can execute these scripts and get the dom. There are various options, I'll list a couple to get you started.
- Selenium
- ghost.py
这篇关于Requests.content不使用Chrome检查元件匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!