我有如下大数据集:
Date rain code
2009-04-01 0.0 0
2009-04-02 0.0 0
2009-04-03 0.0 0
2009-04-04 0.7 1
2009-04-05 54.2 1
2009-04-06 0.0 0
2009-04-07 0.0 0
2009-04-08 0.0 0
2009-04-09 0.0 0
2009-04-10 0.0 0
2009-04-11 0.0 0
2009-04-12 5.3 1
2009-04-13 10.1 1
2009-04-14 6.0 1
2009-04-15 8.7 1
2009-04-16 0.0 0
2009-04-17 0.0 0
2009-04-18 0.0 0
2009-04-19 0.0 0
2009-04-20 0.0 0
2009-04-21 0.0 0
2009-04-22 0.0 0
2009-04-23 0.0 0
2009-04-24 0.0 0
2009-04-25 4.3 1
2009-04-26 42.2 1
2009-04-27 45.6 1
2009-04-28 12.6 1
2009-04-29 6.2 1
2009-04-30 1.0 1
当代码为“1”时,我试图计算连续降雨值的总和,我需要分别计算它们的总和。例如,我想从
2009-04-12
到 2009-04-15
获取降雨值的总和。所以我试图找到方法来定义代码何时等于 1 并且有连续的降雨值我得到它们的总和。对上述问题的任何帮助将不胜感激。
最佳答案
一种直接的解决方案是使用 rle
。但我怀疑那里可能有更“优雅”的解决方案。
# assuming dd is your data.frame
dd.rle <- rle(dd$code)
# get start pos of each consecutive 1's
start <- (cumsum(dd.rle$lengths) - dd.rle$lengths + 1)[dd.rle$values == 1]
# how long do each 1's extend?
ival <- dd.rle$lengths[dd.rle$values == 1]
# using these two, compute the sum
apply(as.matrix(seq_along(start)), 1, function(idx) {
sum(dd$rain[start[idx]:(start[idx]+ival[idx]-1)])
})
# [1] 54.9 30.1 111.9
编辑:
rle
和 tapply
一个更简单的方法。dd.rle <- rle(dd$code)
# get the length of each consecutive 1's
ival <- dd.rle$lengths[dd.rle$values == 1]
# using lengths, construct a `factor` with levels = length(ival)
levl <- factor(rep(seq_along(ival), ival))
# use these levels to extract `rain[code == 1]` and compute sum
tapply(dd$rain[dd$code == 1], levl, sum)
# 1 2 3
# 54.9 30.1 111.9
关于r - 获取连续日值的总和,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/15197133/