问题描述
我在R中使用 rvest
来做一些刮擦。我知道一些HTML和CSS。
I'm using rvest
in R to do some scraping. I know some HTML and CSS.
我想获取每个产品的价格:
I want to get the prices of every product of a URI:
当您在页面上下载时,新项目会加载(如同您所做的滚动一样)。
The new items load as you go down on the page (as you do some scrolling).
到目前为止我已经做了些什么:
What I've done so far:
Linio_Celulares <- html("http://www.linio.com.co/celulares-telefonia-gps/")
Linio_Celulares %>%
html_nodes(".product-itm-price-new") %>%
html_text()
我得到了我所需要的,只是为了25个第一个元素默认加载)。
And i get what i need, but just for the 25 first elements (those load for default).
[1] "$ 1.999.900" "$ 1.999.900" "$ 1.999.900" "$ 2.299.900" "$ 2.279.900"
[6] "$ 2.279.900" "$ 1.159.900" "$ 1.749.900" "$ 1.879.900" "$ 189.900"
[11] "$ 2.299.900" "$ 2.499.900" "$ 2.499.900" "$ 2.799.000" "$ 529.900"
[16] "$ 2.699.900" "$ 2.149.900" "$ 189.900" "$ 2.549.900" "$ 1.395.900"
[21] "$ 249.900" "$ 41.900" "$ 319.900" "$ 149.900"
问题:如何获取此动态部分的所有元素?
I猜测,我可以滚动页面,直到所有元素都被加载,然后使用html(URL)。但这似乎是很多工作(我计划在不同的部分做这个)。应该有一个程序性的工作。
I guess, I could scroll the page until all elements are loaded and then use html(URL). But this seems like a lot of work (i'm planning of doing this on different sections). There should be a programmatic work around.
欢迎任何提示!
推荐答案
如@nrussell所建议的,您可以在获取源代码之前使用 RSelenium
以编程方式向下滚动页面。
As @nrussell suggested, you can use RSelenium
to programatically scroll down the page before getting the source code.
你可以这样做:
library(RSelenium)
library(rvest)
#start RSelenium
checkForServer()
startServer()
remDr <- remoteDriver()
remDr$open()
#navigate to your page
remDr$navigate("http://www.linio.com.co/tecnologia/celulares-telefonia-gps/")
#scroll down 5 times, waiting for the page to load at each time
for(i in 1:5){
remDr$executeScript(paste("scroll(0,",i*10000,");"))
Sys.sleep(3)
}
#get the page html
page_source<-remDr$getPageSource()
#parse it
html(page_source[[1]]) %>% html_nodes(".product-itm-price-new") %>%
html_text()
这篇关于R:rvest:刮动动态电子商务页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!