问题描述
我正在尝试从 http://www.footballoutsiders.com/stats/snapcounts ,但是我无法更改网站下拉框中的字段(团队",星期",职位"和年份").我尝试用rvest刮擦与team ="ALL",week ="1",pos ="All"和year ="2015"关联的表.
I'm attempting to scrape data from http://www.footballoutsiders.com/stats/snapcounts, but I can't change the fields in the drop down boxes on the site ("team", "week", "position", and "year"). My attempt to scrape the table associated with team = "ALL", week= "1", pos = "All", and year= "2015" with rvest is below.
url <- "http://www.footballoutsiders.com/stats/snapcounts"
pgsession <- html_session(url)
pgform <-html_form(pgsession)[[3]]
filled_form <-set_values(pgform,
"team" = "ALL",
"week" = "1",
"pos" = "ALL",
"year" = "2015"
)
submit_form(session=pgsession,form=filled_form, POST=url)
y <- read_html("http://www.footballoutsiders.com/stats/snapcounts")
y <- y %>%
html_nodes("table") %>%
.[[2]] %>%
html_table(header=TRUE)
此代码返回与下拉列表框中的默认变量相关联的表,这些变量是team ="ALL",week ="20",pos ="QB"和year ="2015",这是一个仅包含以下内容的数据帧11个观察.如果它实际更改了字段,它将返回一个包含1,695个观测值的数据框.
This code returns the table associated the default variables in the dropdown box which are team = "ALL", week= "20", pos = "QB", and year= "2015" which is a data frame that only contains 11 observations. If it had actually changed the fields it would have returned a data frame with 1,695 observations.
推荐答案
您可以捕获提交表单时生成的会话,并将该会话用作html_nodes
的输入:
You can capture the session produced when the form is submitted and use that session as input to html_nodes
:
d <- submit_form(session=pgsession, form=filled_form)
y <- d %>%
html_nodes("table") %>%
.[[2]] %>%
html_table(header=TRUE)
dim(y)
#[1] 1695 11
否则,如果使用read_html(url)
,则正在阅读原始页面.
Otherwise, if you use read_html(url)
you are reading the original page.
这篇关于使用R导航和抓取具有下拉HTML表单的网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!