问题描述
我想从网站上抓取比赛结果表 https://www.whoscored.com/Regions/247/Tournaments/36/Seasons/5967/Stages/15737/Fixtures/International-FIFA-World-Cup-2018
我正在使用带有以下代码的 rvest 包:
库(rvest)url.tournament <-https://www.whoscored.com/Regions/247/Tournaments/36/Seasons/5967/Stages/15737/Fixtures/International-FIFA-World-Cup-2018"df.tournament <- read_html(url.tournament) %>%html_nodes(xpath='//*[@id="tournament-fixture-wrapper"]') %>%html_nodes("表")html_table()
虽然没有提取元素.
查看网站的源代码,您可以看到该表实际上并不存在于 HTML 源代码中——它是使用 JavaScript 动态生成的.这就是为什么您的 XPath 查询返回一个空的
在这种情况下,您因此不能依赖 {rvest},您需要使用动态抓取工具,例如 {RSelenium},可以解释 JavaScript.
I would like to scrape the match result table from the website https://www.whoscored.com/Regions/247/Tournaments/36/Seasons/5967/Stages/15737/Fixtures/International-FIFA-World-Cup-2018
I m using rvest package with following code:
library(rvest)
url.tournament <- "https://www.whoscored.com/Regions/247/Tournaments/36/Seasons/5967/Stages/15737/Fixtures/International-FIFA-World-Cup-2018"
df.tournament <- read_html(url.tournament) %>%
html_nodes(xpath='//*[@id="tournament-fixture-wrapper"]') %>%
html_nodes("table")
html_table()
while no element is extracted.
Looking at the website’s source code you can see that the table doesn’t actually exist in the HTML source — it’s dynamically generated using JavaScript. That’s why your XPath query returns an empty <div>
.
You consequently can’t rely on {rvest} in this case, you need to use a dynamic scraper such as {RSelenium}, which can interpret JavaScript.
这篇关于网页抓取足球数据什么都不返回的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!