本文介绍了用 R 抓取 xml/javascript 表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想刮一张这样的桌子 http://www.oddsportal.com//hockey/usa/nhl/carolina-hurricanes-ottawa-senators-80YZhBGC/我想刮掉博彩公司和赔率.问题是我不知道这是一张什么样的桌子,也不知道如何刮它.

I want to scrape a table like this http://www.oddsportal.com//hockey/usa/nhl/carolina-hurricanes-ottawa-senators-80YZhBGC/I'd want to scrape the bookmakers and the odds. The problem is I don't know what kind of a table that is nor how to scrape it.

这些线程可能能够帮助我(使用 R 抓取 javascript什么类型的HTML表格这是什么类型的网页抓取技术,您可以使用哪种类型的网页抓取技术?)但如果有人能指出我正确的方向或更好地在此处提供说明,我将不胜感激.

These threads might be able to help me (Scraping javascript with R or What type of HTML table is this and what type of webscraping techniques can you use?) but I'd appreciate if someone could point me in the right direction or better yet give instructions here.

那么那个赔率表是一种什么样的表,是否可以用 R 刮它,如果可以,怎么做?

So what kind of a table is that odds table, is it possible to scrape it with R and if so, how?

我应该更清楚.我已经用 R 抓取数据有一段时间了,可能不需要基础方面的帮助.经过进一步检查,该表确实是 Javascript,这就是问题所在,我需要什么帮助

I should have been more clear. I have scraped data with R for some time now and probably dont need help with basics. After further inspection that table is indeed Javascript and that is the problem and what I need help with

推荐答案

你可以使用 Selenium 和 RSelenium获取相关数据:

You can use Selenium and RSelenium to get the relevant data:

library(RSelenium)
appURL <- "http://www.oddsportal.com//hockey/usa/nhl/carolina-hurricanes-ottawa-senators-80YZhBGC"
RSelenium::startServer()
remDr <- remoteDriver()
remDr$open()
remDr$navigate(appURL)
tblSource <- remDr$executeScript("return tbls[0].outerHTML;")[[1]]
readHTMLTable(tblSource)
> readHTMLTable(tblSource)
$`NULL`
Bookmakers    1    X    2 Payout
1    bet-at-home  2.25 3.80 2.60  91.6%
2        Â bet365Â Â 2.29 3.79 2.64  92.7%
3        Betsson  2.35 3.75 2.65  93.5%
4           bwin  2.30 3.75 2.70  93.3%
5    MarathonBet  2.35 3.80 2.78  95.4%
6       Titanbet  2.30 3.95 2.50  91.9%
7        TonyBet  2.35 3.70 2.70  93.8%
8         Unibet  2.35 3.85 2.60  93.5%
9   William Hill  2.30 3.90 2.50  91.6%
10        Winner  2.30 3.95 2.50  91.9%
11        youwin  2.40 3.75 2.55  93.0%

这篇关于用 R 抓取 xml/javascript 表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-05 08:20