问题描述
我正在尝试从
那是你的目标.使用以下内容,您可以传入 MN DNR URL 或仅传入 URL 末尾的 id 并取回数据.
库(httr)图书馆(jsonlite)read_lake_survey <- 函数(orig_url_or_id){orig_url_or_id 资源httr::stop_for_status(res)out <- httr::content(res, as="text", encoding="UTF-8")out <- jsonlite::fromJSON(out, flatten=TRUE)出去}
像这样:
orig_url <- "http://www.dnr.state.mn.us/lakefind/showreport.html?downum=27011700"str(read_lake_survey(orig_url), 2)## 4 个列表## $ 时间戳:int 1506900750## $ 状态:chr成功"## $ 结果:13 个列表## ..$ averageWaterClarity: chr "7.0"## ..$ sampledPlants : list()## ..$ officeCode : chr "F314"## ..$ 滨海英亩:int 76## ..$ shoreLengthMiles : num 2.45## ..$ areaAcres : num 152## ..$ 调查:'data.frame':6 个观察.共 52 个变量:## ..$ 访问 :'data.frame': 1 obs.共 5 个变量:## ..$ LakeName : chr "Weaver"## ..$ DOWNNumber : chr "27011700"## ..$ waterClarity : chr [1, 1:2] "07/14/2008" "7"## ..$ meanDepthFeet : num 20.7## ..$ maxDepthFeet : int 57## $ message : chr "正常执行."str(read_lake_survey("27011700"), 2)## 4 个列表## $ 时间戳:int 1506900750## $ 状态:chr成功"## $ 结果:13 个列表## ..$ averageWaterClarity: chr "7.0"## ..$ sampledPlants : list()## ..$ officeCode : chr "F314"## ..$ 滨海英亩:int 76## ..$ shoreLengthMiles : num 2.45## ..$ areaAcres : num 152## ..$ 调查:'data.frame':6 个观察.共 52 个变量:## ..$ 访问 :'data.frame': 1 obs.共 5 个变量:## ..$ LakeName : chr "Weaver"## ..$ DOWNNumber : chr "27011700"## ..$ waterClarity : chr [1, 1:2] "07/14/2008" "7"## ..$ meanDepthFeet : num 20.7## ..$ maxDepthFeet : int 57## $ message : chr "正常执行."str(read_lake_survey("http://example.com"))## 错误:指定的 URL 无效## 3. stop("指定的 URL 无效", call. = FALSE)## 2. read_lake_survey("http://example.com")## 1. str(read_lake_survey("http://example.com"))
你可以戳它来证明它就在那里.
图书馆(tidyverse)# 获取数据到变量中数据 <- read_lake_survey(orig_url)# 专注于调查调查 <- dat$result$surveys
与页面上的弹出窗口匹配的调查有n"个数据框.
在同一个弹出窗口中,还有许多其他带有n"个条目的列表元素与调查相关联.我不做这种类型的分析,所以我不知道将数据框放入或不放入有什么意义.
这可能足以让您走得更远.它只是在调查中添加其他元素.
map2(surveys$fishCatchSummaries,surveys$surveyDate, ~{ .x$survey_date <- .y ; .x }) %>%map2(surveys$surveyType, ~{ .x$survey_type <- .y; .x }) %>%map2(surveys$surveySubType, ~{ .x$survey_subtype <- .y; .x }) %>%map2_df(surveys$surveyID, ~{ .$survey_id <- .y; .x }) %>%as_tibble() %>%type_convert() %>%一瞥()## 观察:120## 变量:12## $ quartileCount 0.5-7.5"、0.7-4.2"、不适用"、0.4-2.2"、0.9-5.7"、1.5-7.3"...## $ CPUE <dbl>25.0, 3.6, 4.0, 0.5, 5.0, 17.5, 6.5, 1.0, 0.8, 0.2, 190.0, 0...## $ totalCatch <int>50, 18, 20, 1, 25, 35, 13, 2, 4, 1, 950, 1, 2, 4, 3, 13, 27,...## $ 物种<chr>YEB"、PMK"、HSF"、WTS"、YEB"、NOP"、BLG"、BLC"、BLC...## $ totalWeight <dbl>41.75, 2.30, 4.50, 3.50, 24.25, 146.25, 3.25, 0.60, 1.45, 2....## $ quartileWeight <chr>0.5-0.8"、0.1-0.2"、不适用"、1.5-2.4"、0.5-0.8"、2.0-3.5"...## $ averageWeight <dbl>0.83, 0.13, 0.23, 3.50, 0.97, 4.18, 0.25, 0.30, 0.36, 2.50, ...## $ gearCount <int>2, 5, 5, 2, 5, 2, 2, 2, 5, 5, 5, 2, 2, 2, 5, 2, 5, 5, 5, 2, ...## $ 齿轮<chr>标准刺网"、标准诱捕网"、标准诱捕网"## $survey_date <date>1980-06-23, 1980-06-23, 1980-06-23, 1980-06-23, 1980-06-23,...## $survey_type <chr>《标准调查》、《标准调查》、《标准调查》、《标准调查》、《标准调查》、《标准调查》、《标准调查》、《标准调查》、《标准调查》、《标准调查》、《标准调查》、《标准调查》、《标准调查...## $survey_subtype <chr>《人口评估》、《人口评估》、《人口...
如果您不熟悉管道,这只是一种避免临时变量的方法.
tmp <- map2(surveys$fishCatchSummaries,surveys$surveyDate, ~{ .x$survey_date <- .y ; .x })tmp <- map2(tmp,surveys$surveyType, ~{.x$survey_type <-.y;.x})tmp <- map2(tmp,surveys$surveySubType, ~{.x$survey_subtype <-.y;.x})tmp <- map2_df(tmp,surveys$surveyID, ~{.$survey_id <-.y;.x})tmp 0.5-7.5"、0.7-4.2"、不适用"、0.4-2.2"、0.9-5.7"、1.5-7.3"...## $ CPUE <dbl>25.0, 3.6, 4.0, 0.5, 5.0, 17.5, 6.5, 1.0, 0.8, 0.2, 190.0, 0...## $ totalCatch <int>50, 18, 20, 1, 25, 35, 13, 2, 4, 1, 950, 1, 2, 4, 3, 13, 27,...## $ 物种<chr>YEB"、PMK"、HSF"、WTS"、YEB"、NOP"、BLG"、BLC"、BLC...## $ totalWeight <dbl>41.75, 2.30, 4.50, 3.50, 24.25, 146.25, 3.25, 0.60, 1.45, 2....## $ quartileWeight <chr>0.5-0.8"、0.1-0.2"、不适用"、1.5-2.4"、0.5-0.8"、2.0-3.5"...## $ averageWeight <dbl>0.83, 0.13, 0.23, 3.50, 0.97, 4.18, 0.25, 0.30, 0.36, 2.50, ...## $ gearCount <int>2, 5, 5, 2, 5, 2, 2, 2, 5, 5, 5, 2, 2, 2, 5, 2, 5, 5, 5, 2, ...## $ 齿轮<chr>标准刺网"、标准诱捕网"、标准诱捕网"## $survey_date <date>1980-06-23, 1980-06-23, 1980-06-23, 1980-06-23, 1980-06-23,...## $survey_type <chr>《标准调查》、《标准调查》、《标准调查》、《标准调查》、《标准调查》、《标准调查》、《标准调查》、《标准调查》、《标准调查》、《标准调查》、《标准调查》、《标准调查》、《标准调查...## $survey_subtype <chr>《人口评估》、《人口评估》、《人口...最终数据## # 小标题:120 x 12## quartileCount CPUE totalCatch 种类 totalWeight quartileWeight averageWeight gearCount gearsurvey_datesurvey_typesurvey_subtype## <chr><dbl><int><chr><dbl><chr><dbl><int><chr><日期><chr><chr>## 1 0.5-7.5 25.0 50 YEB 41.75 0.5-0.8 0.83 2 标准刺网 1980-06-23 标准调查人口评估## 2 0.7-4.2 3.6 18 PMK 2.30 0.1-0.2 0.13 5 标准陷阱网 1980-06-23 标准调查人口评估## 3 N/A 4.0 20 HSF 4.50 N/A 0.23 5 标准捕集网 1980-06-23 标准调查人口评估## 4 0.4-2.2 0.5 1 WTS 3.50 1.5-2.4 3.50 2 标准刺网 1980-06-23 标准调查人口评估## 5 0.9-5.7 5.0 25 YEB 24.25 0.5-0.8 0.97 5 标准陷阱网 1980-06-23 标准调查人口评估## 6 1.5-7.3 17.5 35 NOP 146.25 2.0-3.5 4.18 2 标准刺网 1980-06-23 标准调查人口评估## 7 N/A 6.5 13 BLG 3.25 N/A 0.25 2 标准刺网 1980-06-23 标准调查人口评估## 8 2.5-16.5 1.0 2 BLC 0.60 0.1-0.3 0.30 2 标准刺网 1980-06-23 标准调查人口评估## 9 1.8-21.2 0.8 4 BLC 1.45 0.2-0.3 0.36 5 标准陷阱网 1980-06-23 标准调查人口评估## 10 N/A 0.2 1 NOP 2.50 N/A 2.50 5 标准捕集网 1980-06-23 标准调查人口评估## # ...还有 110 行
I'm trying to scrape the "Fish Sampled" table data fromMinnesota DNR using R rvest package. I used the chrome extension SelectorGadget to find the xpath for the table. I'm unable to get any table data from the webpage into R. Any help is appreciated
library(rvest)
urllakes<- read_html("http://www.dnr.state.mn.us/lakefind/showreport.html?
downum=27011700")
lakesnodes <- html_nodes(urllakes,xpath = '//*[(@id = "lake-survey")]')
html_table(lakesnodes,fill=TRUE) #Error: html_name(x) == "table" is not TRUE
html_text(lakesnodes) # [1] "" but no data is returned
Start a new tab. Open Developer Tools. Then, go to http://www.dnr.state.mn.us/lakefind/showreport.html?downum=27011700.
Go to the Network tab. Look for this:
That's your target. With the following, you can pass in a MN DNR URL or just the id at the end of the URL and get data back.
library(httr)
library(jsonlite)
read_lake_survey <- function(orig_url_or_id) {
orig_url_or_id <- orig_url_or_id[1]
if (grepl("^htt", orig_url_or_id)) {
tmp <- httr::parse_url(orig_url_or_id)
if (!is.null(tmp$query$downum)) {
orig_url_or_id <- tmp$query$downum
} else {
stop("Invalid URL specified", call.=FALSE)
}
}
httr::GET(
url = "http://maps2.dnr.state.mn.us/cgi-bin/lakefinder/detail.cgi",
query = list(
type = "lake_survey",
callback = "",
id = orig_url_or_id,
`_` = as.numeric(Sys.time())
)
) -> res
httr::stop_for_status(res)
out <- httr::content(res, as="text", encoding="UTF-8")
out <- jsonlite::fromJSON(out, flatten=TRUE)
out
}
Like so:
orig_url <- "http://www.dnr.state.mn.us/lakefind/showreport.html?downum=27011700"
str(read_lake_survey(orig_url), 2)
## List of 4
## $ timestamp: int 1506900750
## $ status : chr "SUCCESS"
## $ result :List of 13
## ..$ averageWaterClarity: chr "7.0"
## ..$ sampledPlants : list()
## ..$ officeCode : chr "F314"
## ..$ littoralAcres : int 76
## ..$ shoreLengthMiles : num 2.45
## ..$ areaAcres : num 152
## ..$ surveys :'data.frame': 6 obs. of 52 variables:
## ..$ accesses :'data.frame': 1 obs. of 5 variables:
## ..$ lakeName : chr "Weaver"
## ..$ DOWNumber : chr "27011700"
## ..$ waterClarity : chr [1, 1:2] "07/14/2008" "7"
## ..$ meanDepthFeet : num 20.7
## ..$ maxDepthFeet : int 57
## $ message : chr "Normal execution."
str(read_lake_survey("27011700"), 2)
## List of 4
## $ timestamp: int 1506900750
## $ status : chr "SUCCESS"
## $ result :List of 13
## ..$ averageWaterClarity: chr "7.0"
## ..$ sampledPlants : list()
## ..$ officeCode : chr "F314"
## ..$ littoralAcres : int 76
## ..$ shoreLengthMiles : num 2.45
## ..$ areaAcres : num 152
## ..$ surveys :'data.frame': 6 obs. of 52 variables:
## ..$ accesses :'data.frame': 1 obs. of 5 variables:
## ..$ lakeName : chr "Weaver"
## ..$ DOWNumber : chr "27011700"
## ..$ waterClarity : chr [1, 1:2] "07/14/2008" "7"
## ..$ meanDepthFeet : num 20.7
## ..$ maxDepthFeet : int 57
## $ message : chr "Normal execution."
str(read_lake_survey("http://example.com"))
## Error: Invalid URL specified
## 3. stop("Invalid URL specified", call. = FALSE)
## 2. read_lake_survey("http://example.com")
## 1. str(read_lake_survey("http://example.com"))
You can poke at it to prove it's all there.
library(tidyverse)
# get the data into a variable
dat <- read_lake_survey(orig_url)
# focus on the surveys
surveys <- dat$result$surveys
There are "n" data frames for the surveys that match the popup on the page.
There are also many other list elements with "n" entries that are associated with the surveys in the same popup. I don't do this type of analysis so i don't know what makes sense to put with the data frames or not.
This is likely enough to get you going a bit further. It's just adding other elements to the surveys.
map2(surveys$fishCatchSummaries, surveys$surveyDate, ~{ .x$survey_date <- .y ; .x }) %>%
map2(surveys$surveyType, ~{ .x$survey_type <- .y ; .x }) %>%
map2(surveys$surveySubType, ~{ .x$survey_subtype <- .y ; .x }) %>%
map2_df(surveys$surveyID, ~{ .$survey_id <- .y ; .x }) %>%
as_tibble() %>%
type_convert() %>%
glimpse()
## Observations: 120
## Variables: 12
## $ quartileCount <chr> "0.5-7.5", "0.7-4.2", "N/A", "0.4-2.2", "0.9-5.7", "1.5-7.3"...
## $ CPUE <dbl> 25.0, 3.6, 4.0, 0.5, 5.0, 17.5, 6.5, 1.0, 0.8, 0.2, 190.0, 0...
## $ totalCatch <int> 50, 18, 20, 1, 25, 35, 13, 2, 4, 1, 950, 1, 2, 4, 3, 13, 27,...
## $ species <chr> "YEB", "PMK", "HSF", "WTS", "YEB", "NOP", "BLG", "BLC", "BLC...
## $ totalWeight <dbl> 41.75, 2.30, 4.50, 3.50, 24.25, 146.25, 3.25, 0.60, 1.45, 2....
## $ quartileWeight <chr> "0.5-0.8", "0.1-0.2", "N/A", "1.5-2.4", "0.5-0.8", "2.0-3.5"...
## $ averageWeight <dbl> 0.83, 0.13, 0.23, 3.50, 0.97, 4.18, 0.25, 0.30, 0.36, 2.50, ...
## $ gearCount <int> 2, 5, 5, 2, 5, 2, 2, 2, 5, 5, 5, 2, 2, 2, 5, 2, 5, 5, 5, 2, ...
## $ gear <chr> "Standard gill nets", "Standard trap nets", "Standard trap n...
## $ survey_date <date> 1980-06-23, 1980-06-23, 1980-06-23, 1980-06-23, 1980-06-23,...
## $ survey_type <chr> "Standard Survey", "Standard Survey", "Standard Survey", "St...
## $ survey_subtype <chr> "Population Assessment", "Population Assessment", "Populatio...
If you're not familiar with piping, it's just a way to avoid temporary variables.
tmp <- map2(surveys$fishCatchSummaries, surveys$surveyDate, ~{ .x$survey_date <- .y ; .x })
tmp <- map2(tmp, surveys$surveyType, ~{ .x$survey_type <- .y ; .x })
tmp <- map2(tmp, surveys$surveySubType, ~{ .x$survey_subtype <- .y ; .x })
tmp <- map2_df(tmp, surveys$surveyID, ~{ .$survey_id <- .y ; .x })
tmp <- as_tibble(tmp)
final_data <- type_convert(tmp)
glimpse(final_data)
## Observations: 120
## Variables: 12
## $ quartileCount <chr> "0.5-7.5", "0.7-4.2", "N/A", "0.4-2.2", "0.9-5.7", "1.5-7.3"...
## $ CPUE <dbl> 25.0, 3.6, 4.0, 0.5, 5.0, 17.5, 6.5, 1.0, 0.8, 0.2, 190.0, 0...
## $ totalCatch <int> 50, 18, 20, 1, 25, 35, 13, 2, 4, 1, 950, 1, 2, 4, 3, 13, 27,...
## $ species <chr> "YEB", "PMK", "HSF", "WTS", "YEB", "NOP", "BLG", "BLC", "BLC...
## $ totalWeight <dbl> 41.75, 2.30, 4.50, 3.50, 24.25, 146.25, 3.25, 0.60, 1.45, 2....
## $ quartileWeight <chr> "0.5-0.8", "0.1-0.2", "N/A", "1.5-2.4", "0.5-0.8", "2.0-3.5"...
## $ averageWeight <dbl> 0.83, 0.13, 0.23, 3.50, 0.97, 4.18, 0.25, 0.30, 0.36, 2.50, ...
## $ gearCount <int> 2, 5, 5, 2, 5, 2, 2, 2, 5, 5, 5, 2, 2, 2, 5, 2, 5, 5, 5, 2, ...
## $ gear <chr> "Standard gill nets", "Standard trap nets", "Standard trap n...
## $ survey_date <date> 1980-06-23, 1980-06-23, 1980-06-23, 1980-06-23, 1980-06-23,...
## $ survey_type <chr> "Standard Survey", "Standard Survey", "Standard Survey", "St...
## $ survey_subtype <chr> "Population Assessment", "Population Assessment", "Populatio...
final_data
## # A tibble: 120 x 12
## quartileCount CPUE totalCatch species totalWeight quartileWeight averageWeight gearCount gear survey_date survey_type survey_subtype
## <chr> <dbl> <int> <chr> <dbl> <chr> <dbl> <int> <chr> <date> <chr> <chr>
## 1 0.5-7.5 25.0 50 YEB 41.75 0.5-0.8 0.83 2 Standard gill nets 1980-06-23 Standard Survey Population Assessment
## 2 0.7-4.2 3.6 18 PMK 2.30 0.1-0.2 0.13 5 Standard trap nets 1980-06-23 Standard Survey Population Assessment
## 3 N/A 4.0 20 HSF 4.50 N/A 0.23 5 Standard trap nets 1980-06-23 Standard Survey Population Assessment
## 4 0.4-2.2 0.5 1 WTS 3.50 1.5-2.4 3.50 2 Standard gill nets 1980-06-23 Standard Survey Population Assessment
## 5 0.9-5.7 5.0 25 YEB 24.25 0.5-0.8 0.97 5 Standard trap nets 1980-06-23 Standard Survey Population Assessment
## 6 1.5-7.3 17.5 35 NOP 146.25 2.0-3.5 4.18 2 Standard gill nets 1980-06-23 Standard Survey Population Assessment
## 7 N/A 6.5 13 BLG 3.25 N/A 0.25 2 Standard gill nets 1980-06-23 Standard Survey Population Assessment
## 8 2.5-16.5 1.0 2 BLC 0.60 0.1-0.3 0.30 2 Standard gill nets 1980-06-23 Standard Survey Population Assessment
## 9 1.8-21.2 0.8 4 BLC 1.45 0.2-0.3 0.36 5 Standard trap nets 1980-06-23 Standard Survey Population Assessment
## 10 N/A 0.2 1 NOP 2.50 N/A 2.50 5 Standard trap nets 1980-06-23 Standard Survey Population Assessment
## # ... with 110 more rows
这篇关于使用 rvest 抓取 HTML data.table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!