从网站上抓取表格

从网站上抓取表格

本文介绍了如果可能,使用 R (Rvest) 或 VBA 从网站上抓取表格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从此 URL 中抓取表格:"https://hutdb.net/17/players"我花了很多时间学习 rvest 和使用 selectorgadget,但是每当我尝试获得输出时,我总是得到相同的错误 (Character(0)).

I am trying to scrape the table from this URL:"https://hutdb.net/17/players"I have spent a lot of time learning rvest and using selectorgadget, however whenever I try to get an output I always get the same error (Character(0)).

library(rvest)
library(magrittr)

url   <- read_html("https://hutdb.net/17/players")
table <- url %>%
  html_nodes("td") %>%
  html_text()

任何帮助将不胜感激.

推荐答案

数据是动态加载的,不能直接从 html 中检索.但是,以 Chrome DevTools 中的网络"为例,我们可以在 https://hutdb.net/ajax/stats.php?year=17&page=0&selected=OVR&sort=DESC

The data is dynamically loaded, and cannot be retrieved directly from the html. But, looking at "Network" in Chrome DevTools for instance, we can find a nicely formatted JSON at https://hutdb.net/ajax/stats.php?year=17&page=0&selected=OVR&sort=DESC

library(jsonlite)
dat <- fromJSON("https://hutdb.net/ajax/stats.php?year=17&page=0&selected=OVR&sort=DESC")

输出看起来像:

#     results aOVR    id League Year Card Team              Player Position Type Shoots  HGT
# 1      6308 6308  <NA>   <NA> <NA> <NA> <NA>                <NA>     <NA> <NA>   <NA> <NA>
# 2      <NA> 2030 11782    NHL   17  MOV  OTT       Erik Karlsson       RD  OFD  Right  6'0
# 3      <NA> 2060 11785    NHL   17  MOV  TBL       Victor Hedman       LD  TWD   Left  6'6
# 4      <NA> 2008 11791    NHL   17  MOV  CHI        Patrick Kane       RW  SNP   Left 5'11
# 5      <NA> 2058 13845    NHL   17  SCE  ANA        Ryan Getzlaf        C  PWF  Right  6'4
# 6      <NA> 2074 11824    NHL   17  MOV  BOS       Brad Marchand       LW  TWF   Left  5'9
# 7      <NA> 2008 11829    NHL   17  MOV  EDM      Connor McDavid        C  PLY   Left  6'2
# 8      <NA> 2048 11840    NHL   17  MOV  WSH   Nicklas Backstrom        C  PLY   Left  6'1
# 9      <NA> 2058 11841    NHL   17  MOV  PIT       Sidney Crosby        C  PLY   Left 5'11
# 10     <NA> 2065 13644    NHL   17 TOTY  WPG        Patrik Laine       RW  TWF  Right  6'3
# 11     <NA> 2008 13645    NHL   17 TOTY  EDM      Connor McDavid        C  PLY   Left  6'2
# 12     <NA> 2039 13680    NHL   17 TOTY  LAK        Drew Doughty       RD  TWD  Right  6'1
# 13     <NA> 2063 13689    NHL   17 TOTY  BOS    Patrice Bergeron        C  TWF  Right  6'2

这篇关于如果可能,使用 R (Rvest) 或 VBA 从网站上抓取表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-22 20:45