Python 使用 Selenium 和 Beautiful Soup 抓取 JavaScript

本文介绍了Python 使用 Selenium 和 Beautiful Soup 抓取 JavaScript的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用 BS 和 Selenium 抓取启用 JavaScript 的页面.到目前为止，我有以下代码.它仍然没有以某种方式检测到 JavaScript(并返回一个空值).在这种情况下，我试图抓取底部的 Facebook 评论.(Inspect 元素将类显示为 postText)
感谢您的帮助！

I'm trying to scrape a JavaScript enables page using BS and Selenium.I have the following code so far. It still doesn't somehow detect the JavaScript (and returns a null value). In this case I'm trying to scrape the Facebook comments in the bottom. (Inspect element shows the class as postText)
Thanks for the help!

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
import BeautifulSoup

browser = webdriver.Firefox()
browser.get('http://techcrunch.com/2012/05/15/facebook-lightbox/')
html_source = browser.page_source
browser.quit()

soup = BeautifulSoup.BeautifulSoup(html_source)
comments = soup("div", {"class":"postText"})
print comments

推荐答案

您的代码中有一些错误已在下面修复.但是，类postText"必须存在于其他地方，因为它没有在原始源代码中定义.我对您的代码的修订版本已经过测试，并且可以在多个网站上运行.

There are some mistakes in your code that are fixed below. However, the class "postText" must exist elsewhere, since it is not defined in the original source code.My revised version of your code was tested and is working on multiple websites.

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup

browser = webdriver.Firefox()
browser.get('http://techcrunch.com/2012/05/15/facebook-lightbox/')
html_source = browser.page_source
browser.quit()

soup = BeautifulSoup(html_source,'html.parser')
#class "postText" is not defined in the source code
comments = soup.findAll('div',{'class':'postText'})
print comments

这篇关于Python 使用 Selenium 和 Beautiful Soup 抓取 JavaScript的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！