我正在尝试从此网站Extra Skater抓取数据
放入数据框。通过查看HTML代码,可以看出有多个行类,通过它们可以切换以显示不同的表行。我只对带有标签的行感兴趣:
<tr class="team-game-stats team-game-stats-5v5close hidden">
例如:
<tr class="team-game-stats team-game-stats-5v5close hidden">
<td class="hidden">5v5close</td>
<td><a href="/game/2013-01-19-maple-leafs-canadiens">2013-01-19: Maple Leafs 2 at Canadiens 1</a></td>
<td class="number-right">19.7</td>
<td class="number-right">0</td>
<td class="number-right">0</td>
<td class="number-right">14</td>
<td class="number-right">18</td>
<td class="number-right">43.8%</td>
<td class="number-right">11</td>
<td class="number-right">15</td>
<td class="number-right">42.3%</td>
<td class="number-right">8</td>
<td class="number-right">11</td>
<td class="number-right">42.1%</td>
<td class="number-right">0.0%</td>
<td class="number-right">100.0%</td>
</tr>
当我运行代码时:
library(RCurl)
library(XML)
theurl <- "http://www.extraskater.com/team/montreal-canadiens/2012/gamelog"
tb = readHTMLTable(theurl)
它返回一个列表,其中所有表行都一个排在另一个的顶部。我想我必须使用xpathSApply来提高精度,但是我不确定path参数。当我运行代码时:
library(RCurl)
library(XML)
theurl <- "http://www.extraskater.com/team/montreal-canadiens/2012/gamelog"
webpage <- getURL(theurl)
webpage <- readLines(tc <- textConnection(webpage)); close(tc)
pagetree <- htmlTreeParse(webpage, useInternalNodes = TRUE)
# Extract table header and contents
results <- xpathSApply(pagetree, "//*/table[@class='team-game-stats team-game-stats-5v5close hidden']/tr/td", xmlValue)
结果返回为NULL。
谢谢你的时间。
最佳答案
试试这个 :
xxpath = "//*[@class='team-game-stats team-game-stats-5v5close hidden']"
xpathApply(pagetree,xxpath,readHTMLList)