我正在尝试从此网站Extra Skater抓取数据

放入数据框。通过查看HTML代码,可以看出有多个行类,通过它们可以切换以显示不同的表行。我只对带有标签的行感兴趣:

<tr class="team-game-stats team-game-stats-5v5close hidden">


例如:

<tr class="team-game-stats team-game-stats-5v5close hidden">
    <td class="hidden">5v5close</td>

    <td><a href="/game/2013-01-19-maple-leafs-canadiens">2013-01-19: Maple Leafs 2 at Canadiens 1</a></td>

    <td class="number-right">19.7</td>
    <td class="number-right">0</td>
    <td class="number-right">0</td>
    <td class="number-right">14</td>
    <td class="number-right">18</td>
    <td class="number-right">43.8%</td>
    <td class="number-right">11</td>
    <td class="number-right">15</td>
    <td class="number-right">42.3%</td>
    <td class="number-right">8</td>
    <td class="number-right">11</td>
    <td class="number-right">42.1%</td>
    <td class="number-right">0.0%</td>
    <td class="number-right">100.0%</td>

</tr>


当我运行代码时:

library(RCurl)
library(XML)
theurl <- "http://www.extraskater.com/team/montreal-canadiens/2012/gamelog"
tb = readHTMLTable(theurl)


它返回一个列表,其中所有表行都一个排在另一个的顶部。我想我必须使用xpathSApply来提高精度,但是我不确定path参数。当我运行代码时:

library(RCurl)
library(XML)

theurl <- "http://www.extraskater.com/team/montreal-canadiens/2012/gamelog"
webpage <- getURL(theurl)
webpage <- readLines(tc <- textConnection(webpage)); close(tc)

pagetree <- htmlTreeParse(webpage, useInternalNodes = TRUE)

# Extract table header and contents
results <- xpathSApply(pagetree, "//*/table[@class='team-game-stats team-game-stats-5v5close hidden']/tr/td", xmlValue)


结果返回为NULL。

谢谢你的时间。

最佳答案

试试这个 :

xxpath = "//*[@class='team-game-stats team-game-stats-5v5close hidden']"
xpathApply(pagetree,xxpath,readHTMLList)

07-24 20:00