问题描述
我有一个带有一个小时信号的数据帧.我想将它们分组在10分钟的存储桶中.问题在于开始时间并不是10分钟的整数倍",因此,我没有获得6组,而是获得了7个,其中第一个和最后一个不完整.
I have a dataframe with one hour long signals. I want to group them in 10 minutes buckets. The problem is that the starting time is not precisely a "multiple" of 10 minutes, therefore, instead of obtaining 6 groups, I obtain 7 with the first and the last incomplete.
可以很容易地重现该问题
The issue can be easily reproduced doing
import pandas as pd
import numpy as np
import datetime as dt
rng = pd.date_range('1/1/2011 00:05:30', periods=3600, freq='1S')
ts = pd.DataFrame({'a':np.random.randn(len(rng)),'b':np.random.randn(len(rng))}, index=rng)
interval = dt.timedelta(minutes=10)
ts.groupby(pd.Grouper(freq=interval)).apply(len)
2011-01-01 00:00:00 270
2011-01-01 00:10:00 600
2011-01-01 00:20:00 600
2011-01-01 00:30:00 600
2011-01-01 00:40:00 600
2011-01-01 00:50:00 600
2011-01-01 01:00:00 330
Freq: 10T, dtype: int64
我尝试按照此处所述解决问题base
只需要整数分钟.对于上面的示例(从00:05之后的30秒开始),下面的代码仍然不起作用
I tried to solve it as described here but base
only takes integer number of minutes. For the above example (starting from 30s after 00:05) the code below still doesn't work
ts.groupby(pd.Grouper(freq=interval, base=ts.index[0].minute)).apply(len)
如何为石斑鱼设置通用的开始时间?我的预期输出是
How can I set a generic starting time for the Grouper? My expected output here would be
2011-01-01 00:05:30 600
2011-01-01 00:15:30 600
2011-01-01 00:25:30 600
2011-01-01 00:35:30 600
2011-01-01 00:45:30 600
2011-01-01 00:55:30 600
推荐答案
base
接受float参数.除了分钟,您还必须考虑秒.
base
accepts a float argument. In addition to the minutes, you must also consider the seconds.
base = ts.index[0].minute + ts.index[0].second/60
ts.groupby(pd.Grouper(freq=interval, base=base)).size()
2011-01-01 00:05:30 600
2011-01-01 00:15:30 600
2011-01-01 00:25:30 600
2011-01-01 00:35:30 600
2011-01-01 00:45:30 600
2011-01-01 00:55:30 600
Freq: 10T, dtype: int64
这篇关于 pandas 按时间分组,指定的开始时间为非整数分钟的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!