从数据帧中仅删除零值的时间序列

本文介绍了从数据帧中仅删除零值的时间序列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据帧，其中包含多个由唯一ID标识的时间序列。我想删除任何只有0个值的时间序列。

I have a data frame with multiple time series identified by uniquer id's. I would like to remove any time series that have only 0 values.

数据框如下所示，

id   date          value
AAA  2010/01/01    9
AAA  2010/01/02    10
AAA  2010/01/03    8
AAA  2010/01/04    4
AAA  2010/01/05    12
B    2010/01/01    0
B    2010/01/02    0
B    2010/01/03    0
B    2010/01/04    0
B    2010/01/05    0
CCC  2010/01/01    45
CCC  2010/01/02    46
CCC  2010/01/03    0
CCC  2010/01/04    0
CCC  2010/01/05    40

我希望删除任何只有0个值的时间序列，以便数据帧如下所示

I want any time series with only 0 values to be removed so that the data frame look as follows,

id   date          value
AAA  2010/01/01    9
AAA  2010/01/02    10
AAA  2010/01/03    8
AAA  2010/01/04    4
AAA  2010/01/05    12
CCC  2010/01/01    45
CCC  2010/01/02    46
CCC  2010/01/03    0
CCC  2010/01/04    0
CCC  2010/01/05    40

这是上一个问题的后续解决方案，使用了一个非常好的解决方案包。

This is a follow up to a previous question that was answered with a really great solution using the data.tables package.

推荐答案

如果 dat 是 data.table ，那么这很容易编写和阅读：

If dat is a data.table, then this is easy to write and read :

dat[,.SD[any(value!=0)],by=id]

.SD 代表数据子集。解释了 .SD 很好。

.SD stands for Subset of Data. This answer explains .SD very well.

Gabor很好地使用了 ave ，但不要重复三次相同的变量名（ DF ），如果您有很多长的或相似的变量名，这可能是拼写错误的来源，请尝试：

Picking up on Gabor's nice use of ave, but without repeating the same variable name (DF) three times, which can be a source of typo bugs if you have a lot of long or similar variable names, try :

dat[ ave(value!=0,id,FUN=any) ]

这两者之间的速度差异可能取决于几个因素，包括：i）组数ii）每个组的大小以及iii）列数在实际的 dat 中。

The difference in speed between those two may be dependent on several factors including: i) number of groups ii) size of each group and iii) the number of columns in the real dat.

这篇关于从数据帧中仅删除零值的时间序列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！