我有以下数据框:
df =
Date Team1 Team2
6/1 Boston New York
6/13 New York Chicago
6/27 Boston New York
6/28 Chicago Boston
我想创建一个新列,该列根据team1的条件查找日期差异。例如)当芝加哥是Team1时,我想知道自上次比赛以来的天数,无论他们在上一场比赛中是Team1还是Team2。
df =
Date Team1 Team2 Days since Team1 played
6/1 Boston New York 0
6/13 New York Chicago 12
6/27 Boston New York 26
6/28 Chicago Boston 15
最佳答案
您的预期输出接近,但我会创建一个多索引
使用melt
和diff
然后使用pivot
# melt to get Teams as one columns
melt = df.melt('Date').sort_values('Date')
# groupby and find the difference
melt['diff'] = melt.groupby('value')['Date'].diff()
# pivot to go back to the original df format
melt.pivot('Date','variable')
value diff
variable Team1 Team2 Team1 Team2
Date
2018-06-01 Boston New York NaT NaT
2018-06-13 New York Chicago 12 days NaT
2018-06-27 Boston New York 26 days 14 days
2018-06-28 Chicago Boston 15 days 1 days
更新资料
这是您的评论的更新:
# assume this df
Date Team1 Team2
0 2018-06-01 Boston New York
1 2018-06-13 New York Chicago
2 2018-06-27 Boston New York
3 2018-06-28 Chicago Boston
4 2018-06-28 New York Detroit
码:
# melt df (same as above example)
melt = df.melt('Date').sort_values('Date')
# find the difference
melt['diff'] = melt.groupby('value')['Date'].diff()
# use pivot_table not pivot
piv = melt.pivot_table(index=['Date', 'diff'], columns='variable', values='value', aggfunc=lambda x:x)
# reset index and dropna from team 1
piv.reset_index(level=1, inplace=True)
piv = piv[~piv['Team1'].isna()]
# merge your original df and your new one together
pd.merge(df, piv[piv.columns[:-1]], on=['Date','Team1'], how='outer').fillna(0)
Date Team1 Team2 diff
0 2018-06-01 Boston New York 0 days
1 2018-06-13 New York Chicago 12 days
2 2018-06-27 Boston New York 26 days
3 2018-06-28 Chicago Boston 15 days
4 2018-06-28 New York Detroit 1 days
请注意,这次的差异只是与Team1上一次比赛的时间不同