有没有办法阻止pandas.TimeGrouper()返回不完整的组(ts1)当前,我使用以下命令来确定不完整组成员的数量,然后使用.ix删除这些行(ts2)。我想知道有没有更好的(或内置的)方法来做这个?这是我唯一能找到的。

import pandas as pd
pd.__version__

Out [1]: '0.15.0'

rng = pd.date_range('1/1/2013', periods=365, freq='D')
random_numbers = arange(0, len(rng))
ts = pd.Series(random_numbers, index=rng)
num_days = 3
num_rows_to_drop = len(rng) % num_days
days = 'D'
timedelta_for_grouping = str(num_days) + days
ts1 = ts.groupby(pd.TimeGrouper(timedelta_for_grouping)).transform('median')
ts2 = ts.groupby(pd.TimeGrouper(timedelta_for_grouping)).transform('median').ix[:-num_rows_to_drop]
print ts1.tail(), ts2.tail()

Out [2]:
2013-12-27    361.0
2013-12-28    361.0
2013-12-29    361.0
2013-12-30    363.5
2013-12-31    363.5
Freq: D, dtype: float64
2013-12-25    358
2013-12-26    358
2013-12-27    361
2013-12-28    361
2013-12-29    361
Freq: D, dtype: float64

最佳答案

最简单的方法是过滤组的长度(根据重采样周期的最小值)

In [47]: g = pd.TimeGrouper(timedelta_for_grouping)

In [48]: ts.groupby(g).filter(lambda x: len(x) >= 3).groupby(g).transform('median')
Out[48]:
2013-01-01     1
2013-01-02     1
2013-01-03     1
2013-01-04     4
2013-01-05     4
2013-01-06     4
2013-01-07     7
2013-01-08     7
2013-01-09     7
2013-01-10    10
2013-01-11    10
2013-01-12    10
2013-01-13    13
2013-01-14    13
2013-01-15    13
...
2013-12-15    349
2013-12-16    349
2013-12-17    349
2013-12-18    352
2013-12-19    352
2013-12-20    352
2013-12-21    355
2013-12-22    355
2013-12-23    355
2013-12-24    358
2013-12-25    358
2013-12-26    358
2013-12-27    361
2013-12-28    361
2013-12-29    361
Freq: D, Length: 363

10-06 08:37