问题描述
我实际上已经为 SQL 提出了这个问题,并在这里得到了很好的答案:SQL - LAG 获取上一个满足使用多个先前列的条件时的值
I've actually already asked this question for SQL and got a great answer here:SQL - LAG to get previous value if condition using multiple previous columns satisfied
但现在我需要它用于 Pandas.假设我们有一个数据框:
But now I need it for Pandas. Say we have a dataframe:
df = pd.DataFrame({'id':[1,2,3,4,5,6,7,8],
'EventName':['Team A vs Team B',
'Team A vs Team B',
'Team C vs Team D',
'Team Z vs Team A',
'Team A vs Team B',
'Team C vs Team D',
'Team C vs Team D',
'Team E vs Team F',],
'HomeTeam': ['Team A', 'Team A', 'Team C', 'Team Z',
'Team A', 'Team C', 'Team C', 'Team E'],
'Metric':[5,7,6,8,9,3,1,2]})
结果:
id EventName HomeTeam Metric
------------------------------------------
1 Team A vs Team B Team A 5
2 Team A vs Team B Team A 7
3 Team C vs Team D Team C 6
4 Team Z vs Team A Team Z 8
5 Team A vs Team B Team A 9
6 Team C vs Team D Team C 3
7 Team C vs Team D Team C 1
8 Team E vs Team F Team E 2
我想计算一个新列 PreviousMetricN,其中 N 可以是 1、2、3...,它显示了 Metric 的先前值,但前提是 HomeTeam 参与了先前的事件.例如:
I want to calculate a new column PreviousMetricN where N can be 1, 2, 3, ... which shows the previous value for Metric, but only if the HomeTeam was involved in the previous event. For example:
id EventName HomeTeam Metric PreviousMetric1 PreviousMetric2
------------------------------------------------------------------------
1 Team A vs Team B Team A 5 NULL NULL
2 Team A vs Team B Team A 7 5 NULL
3 Team C vs Team D Team C 6 NULL NULL
4 Team Z vs Team A Team Z 8 NULL NULL
5 Team A vs Team B Team A 9 8 7
6 Team C vs Team D Team C 3 6 NULL
7 Team C vs Team D Team C 1 3 6
8 Team E vs Team F Team E 2 NULL NULL
我想使用 for 循环会很容易.但我需要一个矢量化或使用 shift
/groupby
/np.where
组合的解决方案.甚至不确定从哪里开始?
I guess it would be pretty easy with a for loop. But I need a solution that is vectorized or uses some combination of shift
/groupby
/np.where
. Not even sure where to start with this?
推荐答案
使用@Alollz 结构:
Use @Alollz structure:
df = pd.DataFrame({'id':[1,2,3,4,5,6,7,8],
'EventName':['Team A vs Team B',
'Team A vs Team B',
'Team C vs Team D',
'Team Z vs Team A',
'Team A vs Team B',
'Team C vs Team D',
'Team C vs Team D',
'Team E vs Team F',],
'HomeTeam': ['Team A', 'Team A', 'Team C', 'Team Z',
'Team A', 'Team C', 'Team C', 'Team E'],
'Metric':[5,7,6,8,9,3,1,2]})
dfe = df.assign(teams = df['EventName'].str.split(' vs ')).explode('teams')
shifts = [1, 2, 3]
for i in shifts:
mapper = dfe.groupby('teams')['Metric'].shift(i).mask(dfe['teams'] != dfe['HomeTeam']).drop_duplicates()
df[f'PreviousMetrics{i}'] = df.index.map(mapper)
df
输出:
id EventName HomeTeam Metric PreviousMetrics1 PreviousMetrics2 PreviousMetrics3
0 1 Team A vs Team B Team A 5 NaN NaN NaN
1 2 Team A vs Team B Team A 7 5.0 NaN NaN
2 3 Team C vs Team D Team C 6 NaN NaN NaN
3 4 Team Z vs Team A Team Z 8 NaN NaN NaN
4 5 Team A vs Team B Team A 9 8.0 7.0 5.0
5 6 Team C vs Team D Team C 3 6.0 NaN NaN
6 7 Team C vs Team D Team C 1 3.0 6.0 NaN
7 8 Team E vs Team F Team E 2 NaN NaN NaN
这篇关于Pandas shift - 如果满足多个条件,则获取先前的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!