问题描述
我有很多位置(这么多行)的变量每月值的 data.frame
,我想计算连续几个月的数量(即连续单元格)的值为零。如果只是从左到右阅读,这将很容易,但是增加的复杂性是,年底到年初是连续的。
I've got a data.frame
of monthly values of a variable for many locations (so many rows) and I want to count the numbers of consecutive months (i.e consecutive cells) that have a value of zero. This would be easy if it was just being read left to right, but the added complication is that the end of the year is consecutive to the start of the year.
例如,在下面的简化示例数据集中(用季节而不是月份),位置1有3个 0个月,位置2有2个,而位置3没有。
For example, in the shortened example dataset below (with seasons instead of months),location 1 has 3 '0' months, location 2 has 2, and 3 has none.
df<-cbind(location= c(1,2,3),
Winter=c(0,0,3),
Spring=c(0,2,4),
Summer=c(0,2,7),
Autumn=c(3,0,4))
如何计算这些连续的零值?我已经看过 rle
了,但是我仍然不是一个明智的人!
How can I count these consecutive zero values? I've looked at rle
but I'm still none the wiser currently!
非常感谢您的帮助: )
Many thanks for any help :)
推荐答案
您已经确定了最长运行可能发生的两种情况:(1)在中间某处或(2 )在每行的结尾和开头之间进行分割。因此,您要计算每个条件并取最大值,如下所示:
You've identified the two cases that the longest run can take: (1) somewhere int he middle or (2) split between the end and beginning of each row. Hence you want to calculate each condition and take the max like so:
df<-cbind(
Winter=c(0,0,3),
Spring=c(0,2,4),
Summer=c(0,2,7),
Autumn=c(3,0,4))
#> Winter Spring Summer Autumn
#> [1,] 0 0 0 3
#> [2,] 0 2 2 0
#> [3,] 3 4 7 4
# calculate the number of consecutive zeros at the start and end
startZeros <- apply(df,1,function(x)which.min(x==0)-1)
#> [1] 3 1 0
endZeros <- apply(df,1,function(x)which.min(rev(x==0))-1)
#> [1] 0 1 0
# calculate the longest run of zeros
longestRun <- apply(df,1,function(x){
y = rle(x);
max(y$lengths[y$values==0],0)}))
#> [1] 3 1 0
# take the max of the two values
pmax(longestRun,startZeros +endZeros )
#> [1] 3 2 0
当然,更简单的解决方案是:
Of course an even easier solution is:
longestRun <- apply(cbind(df,df),# tricky way to wrap the zeros from the start to the end
1,# the margin over which to apply the summary function
function(x){# the summary function
y = rle(x);
max(y$lengths[y$values==0],
0)#include zero incase there are no zeros in y$values
})
请注意,上述解决方案之所以有效,是因为我的 df
不包含 location
字段(列)。
Note that the above solution works because my df
does not include the location
field (column).
这篇关于计算R中数据帧每一行中连续出现的特定值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!