问题描述
我正在使用R语言分析河流流量数据,并且有两个嵌套列表.首先保存来自不同河流的数据(流量测试),称为910、950、1012和1087等数字.我有数百个每日流量测量值(流量),但是当我准备年度统计数据时,确切的日期和月份并不重要.Flowtest表中的每个度量(流量)均以年份(年)为参考.
I’m analyzing river streamflow data with R language and I have two nested lists. First holds data (Flowtest) from different river reaches called numbers such as 910, 950, 1012 and 1087. I have hundreds of daily streamflow measurements (Flow), but as I’m preparing yearly statistics the exact day and month doesn’t matter. Each measurement (Flow) is referenced to a year (Year) in the Flowtest table.
Flowtest <- list("910" = tibble(Year = c(2004, 2004, 2005, 2005, 2007, 2008, 2008), Flow=c(123, 170, 187, 245, 679, 870, 820)),
"950" = tibble(Year = c(2004, 2005, 2005, 2005, 2006, 2008, 2008), Flow=c(570, 450, 780, 650, 230, 470, 340)),
"1012" = tibble(Year = c(2005, 2005, 2005, 2005, 2007, 2008, 2008), Flow=c(160, 170, 670, 780, 350, 840, 850)),
"1087" = tibble(Year = c(2004, 2005, 2005, 2007, 2007, 2008, 2008), Flow=c(120, 780, 820, 580, 870, 870, 840)))
第二个嵌套表称为RCHtest,用作查找表.我在与Flowtest不同的数据流数据集上计算了0.75%的百分比(Q3)的值(因此,我不想使用为Flowtest计算的Q3).因此,对于每个所分析的年份(年),我都有一个0.75%的百分位数阈值(Q3).Flowtest和RCHtest中的分析年和河段是相同的.
The second nested table called RCHtest serves as a lookup table. I calculated the value of the 0.75% percentile (Q3) on a different streamflow dataset than Flowtest (So I don’t want to use Q3 calculated for Flowtest). So I have a value of the 0.75% percentile threshold (Q3) for each of the analyzed years (Years). Analyzed years and river reaches are the same in Flowtest and RCHtest.
RCHtest <- list("910" = data.frame(Year = c(2004:2008), Q3=c(650, 720, 550, 580, 800)),
"950" = data.frame(Year = c(2004:2008), Q3=c(550, 770, 520, 540, 790)),
"1012" = data.frame(Year = c(2004:2008), Q3=c(600, 780, 500, 570, 800)),
"1087" = data.frame(Year = c(2004:2008), Q3=c(670, 790, 510, 560, 780)))
我想从Flowtest $ Flow中获得的值的数量超过每个子流域每年RCHtest $ Q3中指定的阈值,如下所示Resulttest.
What I would like to obtain is the quantity of values from Flowtest$Flow which fall above the threshold specified in RCHtest$Q3 per Year, per subbasin as shown in Resulttest below.
Resulttest <- list("910" = data.frame(Year = c(2004:2008), aboveQ3=c(0, 0, 0, 1, 2)),
"950" = data.frame(Year = c(2004:2008), aboveQ3=c(1, 1, 0, 0, 0)),
"1012" = data.frame(Year = c(2004:2008), aboveQ3=c(0, 2, 0, 0, 2)),
"1087" = data.frame(Year = c(2004:2008), aboveQ3=c(0, 1, 0, 2, 2)))
该如何处理?请帮助
推荐答案
您可以将 Map
与 aggregate
组合使用:
Map(function(x, y) aggregate(Flow > Q3~Year, merge(x, y, all = TRUE,
na.action = 'na.pass'), sum, na.rm = TRUE, na.action = 'na.pass'),
Flowtest, RCHtest)
这将返回:
#$`910`
# Year Flow > Q3
#1 2004 0
#2 2005 0
#3 2006 0
#4 2007 1
#5 2008 2
#$`950`
# Year Flow > Q3
#1 2004 1
#2 2005 1
#3 2006 0
#4 2007 0
#5 2008 0
#$`1012`
# Year Flow > Q3
#1 2004 0
#2 2005 0
#3 2006 0
#4 2007 0
#5 2008 2
#$`1087`
# Year Flow > Q3
#1 2004 0
#2 2005 1
#3 2006 0
#4 2007 2
#5 2008 2
如果要使用 tidyverse
函数执行此操作,则可以执行以下操作:
If you want to do this using tidyverse
functions you can do :
library(dplyr)
library(purrr)
map2(Flowtest, RCHtest, ~full_join(.x, .y) %>%
group_by(Year) %>%
summarise(sum = sum(Flow > Q3, na.rm = TRUE)))
这篇关于使用嵌套的查找表在第二个表中查找高于阈值的值,并在R中对其进行量化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!