我正在尝试在以下代码中过滤掉一部分数据。
如果该组中只有一个元素,我想用FG ='Y'过滤这些列。另外,在FG列中同时具有“ N”和“ Y”的组合的那些组之间,当且仅当在FG =“ N”的60天之后提交FG =“ Y”时,我才选择它。
from datetime import timedelta
import datetime as dt
from dateutil.parser import parse
import pandas as pd
import numpy as np
data={'Name':['A','A','A','B','B','B','C','D','D','D','E','E','E','F','G','G','G','H','H','H'],'FG':['Y','Y','Y','N','N','Y','Y','Y','Y','Y','Y','N','N','N','Y','N','N','Y','Y','N'],
'Program': ['Eval','Eval','Eval','IB','Eval','IB','PO','PO','Info','IB','Info','Info','Info','Ted', 'Info','Ted','Ted','PO','PO','PO'],
'Date':['2016/10/01','2017/10/01','2016/11/11','2017/10/01','2016/10/01','2017/10/02','2017/10/01','2017/10/01','2017/06/03',
'2017/10/01','2017/10/21','2017/10/21','2017/08/01','2017/10/10', '2017/10/21','2017/08/01','2017/10/10', '2017/04/01','2017/01/30','2017/01/01']}
df=pd.DataFrame(data=data,columns=['Name','FG','Program', 'Date'])
df['Date']=pd.to_datetime(df['Date']).dt.date
df=df.sort_values('Date', ascending=True).drop_duplicates(subset=['Name', 'FG','Program'], keep='last')
df['check']=df.groupby(['Name', 'Program']).Date.transform('min')
df['check']=df['check']+timedelta(60)
mask=df.groupby(['Name','Program']).apply(lambda x : ((x.FG=='Y') & (x.Date>= x.check)) if len(x.Date)>1 else x.FG=='Y')).values
X=df[mask]
预期输出应为
Name FG Program Date
A Y Eval 2017-10-01
C Y PO 2017-10-01
D Y Info 2017-06-03
D Y PO 2017-10-01
D Y IB 2017-10-01
G Y Info 2017-10-21
H Y PO 2017-04-01
看来我在mask变量中的过滤器不起作用。此外,任何建议将FG ='N'的日期与FG ='Y'的日期进行比较的建议,将不胜感激
最佳答案
您可以使用groupby
和apply
获得所需的结果,不需要提前创建df.check
:
def filterer(x):
y = x.FG.eq('Y')
n = x.FG.eq('N')
if 'N' in x.FG.values:
if x.loc[y, 'Date'].values > x.loc[n, 'Date'].values + timedelta(60):
return x.loc[y]
elif 'Y' in x.FG.values:
return x
(df.groupby(['Name','Program'])
.apply(filterer)
.sort_values(["Name","Date"])
.reset_index(drop=True)
)
输出:
Name FG Program Date
0 A Y Eval 2017-10-01
1 C Y PO 2017-10-01
2 D Y Info 2017-06-03
3 D Y IB 2017-10-01
4 D Y PO 2017-10-01
5 G Y Info 2017-10-21
6 H Y PO 2017-04-01
关于python - Pandas :根据某些条件过滤分组依据数据,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/47129901/