问题描述
我目前正在开始使用beautifulsoup抓取网站,尽管我缺乏有关网页的理论知识,但我想我已经掌握了基本知识,我会尽力提出自己的问题.
I am currently begining to use beautifulsoup to scrape websites, I think I got the basics even though I lack theoretical knowledge about webpages, I will do my best to formulate my question.
动态网页的含义是:一个HTML网站,其HTML根据用户操作(在我的情况下是可折叠表格)而更改.
What I mean with dynamical webpage is the following: a site whose HTML changes based on user action, in my case its collapsible tables.
我想在"div"标签中获取数据,但是当您加载页面时,该数据在html代码中似乎是不可用的,当您单击表时它将展开,并且此"div"的类"从类似"blabla可折叠的东西"变为"blabla可折叠活动的东西",这我可以用我的知识来抓取.
I want to obtain the data inside some "div" tag but when you load the page, the data seems unavalible in the html code, when you click on the table it expands, and the "class" of this "div" changes from something like "something blabla collapsible" to "something blabla collapsible active" and this I can scrape with my knowledge.
我可以使用beautifulsoup获得此数据吗?万一我做不到,我想过要使用诸如硒之类的东西来单击所有表格,然后下载html(我可以抓取),有没有更简单的方法?
Can I get this data using beautifulsoup? In case I can't, I thought of using something like selenium to click on all the tables and then download the html, which I could scrape, is there an easier way?
非常感谢您.
推荐答案
这要视情况而定.如果在页面加载时已经加载了数据,那么该数据可以被抓取,只是在另一个元素中,还是被隐藏了.如果click事件以某种方式触发了数据加载,则不会,您将需要Selenium或其他无头浏览器来自动执行此操作.
It depends. If the data is already loaded when the page loads, then the data is available to scrape, it's just in a different element, or being hidden. If the click event triggers loading of the data in some way, then no, you will need Selenium or another headless browser to automate this.
美丽汤只 是HTML解析器,因此,通过请求页面获取的任何数据都是美丽汤唯一可以访问的数据.
Beautiful soup is only an HTML parser, so whatever data you get by requesting the page is the only data that beautiful soup can access.
这篇关于是否有可能在网页上抓取“动态网页"?与Beautifulsoup?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!