有没有办法阻止pandas.TimeGrouper()
返回不完整的组(ts1)当前,我使用以下命令来确定不完整组成员的数量,然后使用.ix
删除这些行(ts2)。我想知道有没有更好的(或内置的)方法来做这个?这是我唯一能找到的。
import pandas as pd
pd.__version__
Out [1]: '0.15.0'
rng = pd.date_range('1/1/2013', periods=365, freq='D')
random_numbers = arange(0, len(rng))
ts = pd.Series(random_numbers, index=rng)
num_days = 3
num_rows_to_drop = len(rng) % num_days
days = 'D'
timedelta_for_grouping = str(num_days) + days
ts1 = ts.groupby(pd.TimeGrouper(timedelta_for_grouping)).transform('median')
ts2 = ts.groupby(pd.TimeGrouper(timedelta_for_grouping)).transform('median').ix[:-num_rows_to_drop]
print ts1.tail(), ts2.tail()
Out [2]:
2013-12-27 361.0
2013-12-28 361.0
2013-12-29 361.0
2013-12-30 363.5
2013-12-31 363.5
Freq: D, dtype: float64
2013-12-25 358
2013-12-26 358
2013-12-27 361
2013-12-28 361
2013-12-29 361
Freq: D, dtype: float64
最佳答案
最简单的方法是过滤组的长度(根据重采样周期的最小值)
In [47]: g = pd.TimeGrouper(timedelta_for_grouping)
In [48]: ts.groupby(g).filter(lambda x: len(x) >= 3).groupby(g).transform('median')
Out[48]:
2013-01-01 1
2013-01-02 1
2013-01-03 1
2013-01-04 4
2013-01-05 4
2013-01-06 4
2013-01-07 7
2013-01-08 7
2013-01-09 7
2013-01-10 10
2013-01-11 10
2013-01-12 10
2013-01-13 13
2013-01-14 13
2013-01-15 13
...
2013-12-15 349
2013-12-16 349
2013-12-17 349
2013-12-18 352
2013-12-19 352
2013-12-20 352
2013-12-21 355
2013-12-22 355
2013-12-23 355
2013-12-24 358
2013-12-25 358
2013-12-26 358
2013-12-27 361
2013-12-28 361
2013-12-29 361
Freq: D, Length: 363