我有3个变量的数据框:POSIXct对象-time
,数字-RRR
和factor-he
。其中RRR
是液体降水量,而he
是水文事件编号,此处的时间对应于洪水事件的开始。
df <- structure(list(time = structure(c(1396879200, 1396922400, 1396976400,
1397008800, 1397095200, 1397332800, 1397354400, 1397397600, 1397451600,
1397484000, 1397527200, 1397786400, 1397959200, 1398002400, 1398024000,
1398132000, 1398175200, 1398218400, 1398261600, 1398369600, 1398466800,
1398477600, 1398520800, 1398564000, 1398607200, 1398747600, 1398780000,
1398909600, 1398952800, 1398974400, 1398996000),
class = c("POSIXct", "POSIXt"),
tzone = ""),
RRR = c(NA, 2, NA, 4, NA, NA, 0.9, 3,
NA, 0.4, 11, NA, 0.5, 1, NA, 13, 4, 0.8, 0.3, NA, NA, 8, 4, 11,
1, NA, 7, 1, 0.4, NA, 4),
he = c(1, NA, 2, NA, 3, 4, NA, NA,
5, NA, NA, 6, NA, NA, 7, NA, NA, NA, NA, 8, 9, NA, NA, NA, NA,
10, NA, NA, NA, 11, NA)),
class = "data.frame",
row.names = c(NA, -31L))
我的数据框的头部如下所示:
> df
time RRR he
1 2014-04-07 18:00:00 NA 1
2 2014-04-08 06:00:00 2.0 NA
3 2014-04-08 21:00:00 NA 2
4 2014-04-09 06:00:00 4.0 NA
5 2014-04-10 06:00:00 NA 3
6 2014-04-13 00:00:00 NA 4
7 2014-04-13 06:00:00 0.9 NA
8 2014-04-13 18:00:00 3.0 NA
9 2014-04-14 09:00:00 NA 5
我需要计算每个
he
值和的最后一个非NA RRR
值之间的时间之间的时间差。例如,对于he = 2
,期望的差异将是difftime(df$time[3], df$time[2])
,而对于he = 4
,时间差异应是difftime(df$time[6], df$time[4])
。所以最后我想得到一个这样的数据框,其中“diff”是时差,以小时为单位。> df
time RRR he diff
1 2014-04-07 18:00:00 NA 1 NA
2 2014-04-08 06:00:00 2.0 NA NA
3 2014-04-08 21:00:00 NA 2 15
4 2014-04-09 06:00:00 4.0 NA NA
5 2014-04-10 06:00:00 NA 3 24
6 2014-04-13 00:00:00 NA 4 90
7 2014-04-13 06:00:00 0.9 NA NA
8 2014-04-13 18:00:00 3.0 NA NA
9 2014-04-14 09:00:00 NA 5 15
最佳答案
我确信必须有更简单的方法,但是使用tidyverse
和data.table
可以做到:
df %>%
mutate(time = as.POSIXct(time, format = "%Y-%m-%d %H:%M:%S")) %>% #Transforming "time" into a datetime object
fill(RRR) %>% #Filling the NA values in "RRR" with tha last non-NA value
group_by(temp = rleid(RRR)) %>% #Grouping by run length of "RRR"
mutate(temp2 = seq_along(temp)) %>% #Sequencing around the run length of "RRR"
group_by(RRR, temp) %>% #Group by "RRR" and run length of "RRR"
mutate(diff = ifelse(!is.na(he), difftime(time, time[temp2 == 1], units="hours"), NA)) %>% #Computing the difference in hours between the first occurrence of a non-NA "RRR" value and the non-NA "he" values
ungroup() %>%
select(-temp, -temp2, -RRR) %>% #Removing the redundant variables
rowid_to_column() %>% #Creating unique row IDs
left_join(df %>%
rowid_to_column() %>%
select(RRR, rowid), by = c("rowid" = "rowid")) %>% #Merging with the original df to get the original values of "RRR"
select(-rowid) #Removing the redundant variables
time he diff RRR
<dttm> <dbl> <dbl> <dbl>
1 2014-04-07 16:00:00 1. 0. NA
2 2014-04-08 04:00:00 NA NA 2.00
3 2014-04-08 19:00:00 2. 15. NA
4 2014-04-09 04:00:00 NA NA 4.00
5 2014-04-10 04:00:00 3. 24. NA
6 2014-04-12 22:00:00 4. 90. NA
7 2014-04-13 04:00:00 NA NA 0.900
8 2014-04-13 16:00:00 NA NA 3.00
9 2014-04-14 07:00:00 5. 15. NA
10 2014-04-14 16:00:00 NA NA 0.400
关于r - difftime与其他列中的先前非NA值,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/53778402/