问题描述
我有一个数据帧,其中包含多个由唯一ID标识的时间序列。我想删除任何只有0个值的时间序列。
I have a data frame with multiple time series identified by uniquer id's. I would like to remove any time series that have only 0 values.
数据框如下所示,
id date value
AAA 2010/01/01 9
AAA 2010/01/02 10
AAA 2010/01/03 8
AAA 2010/01/04 4
AAA 2010/01/05 12
B 2010/01/01 0
B 2010/01/02 0
B 2010/01/03 0
B 2010/01/04 0
B 2010/01/05 0
CCC 2010/01/01 45
CCC 2010/01/02 46
CCC 2010/01/03 0
CCC 2010/01/04 0
CCC 2010/01/05 40
我希望删除任何只有0个值的时间序列,以便数据帧如下所示
I want any time series with only 0 values to be removed so that the data frame look as follows,
id date value
AAA 2010/01/01 9
AAA 2010/01/02 10
AAA 2010/01/03 8
AAA 2010/01/04 4
AAA 2010/01/05 12
CCC 2010/01/01 45
CCC 2010/01/02 46
CCC 2010/01/03 0
CCC 2010/01/04 0
CCC 2010/01/05 40
这是上一个问题的后续解决方案,使用了一个非常好的解决方案包。
This is a follow up to a previous question that was answered with a really great solution using the data.tables package.
推荐答案
如果 dat
是 data.table
,那么这很容易编写和阅读:
If dat
is a data.table
, then this is easy to write and read :
dat[,.SD[any(value!=0)],by=id]
.SD
代表数据子集。 解释了 .SD
很好。
.SD
stands for Subset of Data. This answer explains .SD
very well.
Gabor很好地使用了 ave
,但不要重复三次相同的变量名( DF
),如果您有很多长的或相似的变量名,这可能是拼写错误的来源,请尝试:
Picking up on Gabor's nice use of ave
, but without repeating the same variable name (DF
) three times, which can be a source of typo bugs if you have a lot of long or similar variable names, try :
dat[ ave(value!=0,id,FUN=any) ]
这两者之间的速度差异可能取决于几个因素,包括:i)组数ii)每个组的大小以及iii)列数在实际的 dat
中。
The difference in speed between those two may be dependent on several factors including: i) number of groups ii) size of each group and iii) the number of columns in the real dat
.
这篇关于从数据帧中仅删除零值的时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!