问题描述
我正在尝试从此页面抓取数据:
I am trying to scrape data from this page:
If I try to scrape the name of the players using the css selector and the usual rvest syntax:
names <- read_html("http://www.atpworldtour.com/en/tournaments/brisbane-international-presented-by-suncorp/339/2016/match-stats/r975/f324/match-stats?") %>%
html_nodes(".scoring-player-name") %>% sapply(html_text)
everything goes well.
Unfortunately if I try to scrape the statistics below (first serve pts won, ..)using the selector .stat-breakdown span
I am not able to retrieve any data.
I know rvest is generally not recommended to scrape pages created dynamically, however I don't understand why some data are scraped and some not.
I don't use Rvest. If you follow the code below you should get to the format which is in the picture basically a string which you could transform to dataframe based on separators :, .
This Tag also contains more information than it was displayed in UI of webpage.I can try also RSelenium but need to get my other PC. So I would let you know if RSelenium worked for me.
library(XML)
library(RCurl)
library(stringr)
url<-"http://www.atpworldtour.com/en/tournaments/brisbane-international-presented-by-suncorp/339/2016/match-stats/r975/f324/match-stats?"
url2<-getURL(url)
parsed<-htmlParse(url2)
# get messi data from tag
step1<-xpathSApply(parsed,"//script[@id='matchStatsData']",xmlValue)
# removing some unwanted characters
step2<-str_replace_all(step1,"\r\n","")
step3<-str_replace_all(step2,"\t","")
step4<-str_replace_all(step3,"[[{}]\"]","")
Output then is a string like this
这篇关于使用 rvest 从闪存页面中抓取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!