本文介绍了Pandas shift - 如果满足多个条件,则获取先前的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我实际上已经为 SQL 提出了这个问题,并在这里得到了很好的答案:SQL - LAG 获取上一个满足使用多个先前列的条件时的值

I've actually already asked this question for SQL and got a great answer here:SQL - LAG to get previous value if condition using multiple previous columns satisfied

但现在我需要它用于 Pandas.假设我们有一个数据框:

But now I need it for Pandas. Say we have a dataframe:

df = pd.DataFrame({'id':[1,2,3,4,5,6,7,8],
                   'EventName':['Team A vs Team B',
                                'Team A vs Team B',
                                'Team C vs Team D',
                                'Team Z vs Team A',
                                'Team A vs Team B',
                                'Team C vs Team D',
                                'Team C vs Team D',
                                'Team E vs Team F',],
                   'HomeTeam': ['Team A', 'Team A', 'Team C', 'Team Z',
                                'Team A', 'Team C', 'Team C', 'Team E'],
                   'Metric':[5,7,6,8,9,3,1,2]})

结果:

id  EventName           HomeTeam    Metric
------------------------------------------
1   Team A vs Team B    Team A      5
2   Team A vs Team B    Team A      7
3   Team C vs Team D    Team C      6
4   Team Z vs Team A    Team Z      8
5   Team A vs Team B    Team A      9
6   Team C vs Team D    Team C      3
7   Team C vs Team D    Team C      1
8   Team E vs Team F    Team E      2

我想计算一个新列 PreviousMetricN,其中 N 可以是 1、2、3...,它显示了 Metric 的先前值,但前提是 HomeTeam 参与了先前的事件.例如:

I want to calculate a new column PreviousMetricN where N can be 1, 2, 3, ... which shows the previous value for Metric, but only if the HomeTeam was involved in the previous event. For example:

id  EventName           HomeTeam    Metric  PreviousMetric1 PreviousMetric2
------------------------------------------------------------------------
1   Team A vs Team B    Team A      5       NULL            NULL
2   Team A vs Team B    Team A      7       5               NULL
3   Team C vs Team D    Team C      6       NULL            NULL
4   Team Z vs Team A    Team Z      8       NULL            NULL
5   Team A vs Team B    Team A      9       8               7
6   Team C vs Team D    Team C      3       6               NULL
7   Team C vs Team D    Team C      1       3               6
8   Team E vs Team F    Team E      2       NULL            NULL

我想使用 for 循环会很容易.但我需要一个矢量化或使用 shift/groupby/np.where 组合的解决方案.甚至不确定从哪里开始?

I guess it would be pretty easy with a for loop. But I need a solution that is vectorized or uses some combination of shift/groupby/np.where. Not even sure where to start with this?

推荐答案

使用@Alollz 结构:

Use @Alollz structure:

df = pd.DataFrame({'id':[1,2,3,4,5,6,7,8],
                   'EventName':['Team A vs Team B',
                                'Team A vs Team B',
                                'Team C vs Team D',
                                'Team Z vs Team A',
                                'Team A vs Team B',
                                'Team C vs Team D',
                                'Team C vs Team D',
                                'Team E vs Team F',],
                   'HomeTeam': ['Team A', 'Team A', 'Team C', 'Team Z',
                                'Team A', 'Team C', 'Team C', 'Team E'],
                   'Metric':[5,7,6,8,9,3,1,2]})

dfe = df.assign(teams = df['EventName'].str.split(' vs ')).explode('teams')

shifts = [1, 2, 3]
for i in shifts:
    mapper = dfe.groupby('teams')['Metric'].shift(i).mask(dfe['teams'] != dfe['HomeTeam']).drop_duplicates()
    df[f'PreviousMetrics{i}'] = df.index.map(mapper)
df

输出:

   id         EventName HomeTeam  Metric  PreviousMetrics1  PreviousMetrics2  PreviousMetrics3
0   1  Team A vs Team B   Team A       5               NaN               NaN               NaN
1   2  Team A vs Team B   Team A       7               5.0               NaN               NaN
2   3  Team C vs Team D   Team C       6               NaN               NaN               NaN
3   4  Team Z vs Team A   Team Z       8               NaN               NaN               NaN
4   5  Team A vs Team B   Team A       9               8.0               7.0               5.0
5   6  Team C vs Team D   Team C       3               6.0               NaN               NaN
6   7  Team C vs Team D   Team C       1               3.0               6.0               NaN
7   8  Team E vs Team F   Team E       2               NaN               NaN               NaN

这篇关于Pandas shift - 如果满足多个条件,则获取先前的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-03 10:35