我正在尝试抓取此网站:
http://www.racingpost.com/greyhounds/result_home.sd#resultDay=2015-12-26&meetingId=18&isFullMeeting=true
通过R中的rvest
包。
不幸的是,似乎rvest
无法通过CSS选择器识别节点。
例如,如果我尝试提取每个表的标题(坡度,奖赏,距离)中的信息,其CSS选择器为“.black”,然后运行以下代码:
URL <- read_html("http://www.racingpost.com/greyhounds/result_home.sd#resultDay=2015-12-26&meetingId=18&isFullMeeting=true")
nodes<-html_nodes(URL, ".black")
节点显示为空列表,因此不会刮任何东西。
最佳答案
它正在发出XHR请求以生成HTML。尝试以下操作(这也将使自动进行数据捕获更加容易):
library(httr)
library(xml2)
library(rvest)
res <- GET("http://www.racingpost.com/greyhounds/result_by_meeting_full.sd",
query=list(r_date="2015-12-26",
meeting_id=18))
doc <- read_html(content(res, as="text"))
html_nodes(doc, ".black")
## {xml_nodeset (56)}
## [1] <span class="black">A9</span>
## [2] <span class="black">£61</span>
## [3] <span class="black">470m</span>
## [4] <span class="black">-30</span>
## [5] <span class="black">H2</span>
## [6] <span class="black">£105</span>
## [7] <span class="black">470m</span>
## [8] <span class="black">-30</span>
## [9] <span class="black">A7</span>
## [10] <span class="black">£61</span>
## [11] <span class="black">470m</span>
## [12] <span class="black">-30</span>
## [13] <span class="black">A5</span>
## [14] <span class="black">£66</span>
## [15] <span class="black">470m</span>
## [16] <span class="black">-30</span>
## [17] <span class="black">A8</span>
## [18] <span class="black">£61</span>
## [19] <span class="black">470m</span>
## [20] <span class="black">-20</span>
## ...
关于r - Rvest无法识别CSS选择器,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/34473847/