本文介绍了从数据帧中仅删除零值的时间序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据帧,其中包含多个由唯一ID标识的时间序列。我想删除任何只有0个值的时间序列。

I have a data frame with multiple time series identified by uniquer id's. I would like to remove any time series that have only 0 values.

数据框如下所示,

id   date          value
AAA  2010/01/01    9
AAA  2010/01/02    10
AAA  2010/01/03    8
AAA  2010/01/04    4
AAA  2010/01/05    12
B    2010/01/01    0
B    2010/01/02    0
B    2010/01/03    0
B    2010/01/04    0
B    2010/01/05    0
CCC  2010/01/01    45
CCC  2010/01/02    46
CCC  2010/01/03    0
CCC  2010/01/04    0
CCC  2010/01/05    40

我希望删除任何只有0个值的时间序列,以便数据帧如下所示

I want any time series with only 0 values to be removed so that the data frame look as follows,

id   date          value
AAA  2010/01/01    9
AAA  2010/01/02    10
AAA  2010/01/03    8
AAA  2010/01/04    4
AAA  2010/01/05    12
CCC  2010/01/01    45
CCC  2010/01/02    46
CCC  2010/01/03    0
CCC  2010/01/04    0
CCC  2010/01/05    40

这是上一个问题的后续解决方案,使用了一个非常好的解决方案包。

This is a follow up to a previous question that was answered with a really great solution using the data.tables package.

推荐答案

如果 dat data.table ,那么这很容易编写和阅读:

If dat is a data.table, then this is easy to write and read :

dat[,.SD[any(value!=0)],by=id]

.SD 代表数据子集。 解释了 .SD 很好。

.SD stands for Subset of Data. This answer explains .SD very well.

Gabor很好地使用了 ave ,但不要重复三次相同的变量名( DF ),如果您有很多长的或相似的变量名,这可能是拼写错误的来源,请尝试:

Picking up on Gabor's nice use of ave, but without repeating the same variable name (DF) three times, which can be a source of typo bugs if you have a lot of long or similar variable names, try :

dat[ ave(value!=0,id,FUN=any) ]

这两者之间的速度差异可能取决于几个因素,包括:i)组数ii)每个组的大小以及iii)列数在实际的 dat 中。

The difference in speed between those two may be dependent on several factors including: i) number of groups ii) size of each group and iii) the number of columns in the real dat.

这篇关于从数据帧中仅删除零值的时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-20 22:25