我有以下数据集
lst=[
['a','2019-03-02 23:20:28',0],
['a','2019-03-02 23:21:29',0],
['a','2019-03-02 23:22:30',1],
['a','2019-03-02 23:30:31',0.5],
['a','2019-03-02 23:38:32',0.5],
['a','2019-03-02 23:50:32',0.5],
['a','2019-03-02 23:50:32',0],
['b','2019-03-02 23:10:32',0],
['b','2019-03-02 23:12:32',0],
['b','2019-03-02 23:20:32',1],
['b','2019-03-02 23:30:32',0.5],
['b','2019-03-02 23:50:32',1],
['b','2019-03-02 23:55:32',1],
['b','2019-03-02 23:56:32',0],
['a','2019-03-02 22:20:28',0],
['a','2019-03-02 22:21:29',0],
['a','2019-03-02 22:22:30',1],
['a','2019-03-02 22:30:31',0.5],
['a','2019-03-02 22:30:32',0],
]
df = pd.DataFrame(lst,columns=['ID','ts','signal'])
df['ts']=pd.to_datetime(df['ts'])
我想获取每个ID的所有那些行,即信号列中0s之间的总经过时间大于15分钟。
即仅
并且仅对于b:
最佳答案
使用:
#filter out rows with 0
df1 = df[df['signal'].ne(0)]
#create Series from original column for unique consecutive groups for non 0 rows
a = df['signal'].eq(0).cumsum()
thr = pd.Timedelta(15, unit='min')
#get difference between first and last value per group and filtering by thresh
df2 = df1[df1['ts'].groupby(a).transform(lambda x: x.iat[-1] - x.iat[0]) > thr]
print (df2)
ID ts signal
2 a 2019-03-02 23:22:30 1.0
3 a 2019-03-02 23:30:31 0.5
4 a 2019-03-02 23:38:32 0.5
5 a 2019-03-02 23:50:32 0.5
9 b 2019-03-02 23:20:32 1.0
10 b 2019-03-02 23:30:32 0.5
11 b 2019-03-02 23:50:32 1.0
12 b 2019-03-02 23:55:32 1.0
关于python - 基于另一列条件的datetime列的groupby耗时,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/55261049/