本文介绍了带有布尔掩码的 Python Groupby的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有以下通用格式的 Pandas 数据框:

I have a pandas dataframe with the following general format:

id,atr1,atr2,orig_date,fix_date
1,bolt,l,2000-01-01,nan
1,screw,l,2000-01-01,nan
1,stem,l,2000-01-01,nan
2,stem,l,2000-01-01,nan
2,screw,l,2000-01-01,nan
2,stem,l,2001-01-01,2001-01-01
3,bolt,r,2000-01-01,nan
3,stem,r,2000-01-01,nan
3,bolt,r,2001-01-01,2001-01-01
3,stem,r,2001-01-01,2001-01-01

结果如下:

id,atr1,atr2,orig_date,fix_date,failed_part_ind
1,bolt,l,2000-01-01,nan,0
1,screw,l,2000-01-01,nan,0
1,stem,l,2000-01-01,nan,0
2,stem,l,2000-01-01,nan,1
2,screw,l,2000-01-01,nan,0
2,stem,l,2001-01-01,2001-01-01,0
3,bolt,r,2000-01-01,nan,1
3,stem,r,2000-01-01,nan,1
3,bolt,r,2001-01-01,2001-01-01,0
3,stem,r,2001-01-01,2001-01-01,0

欢迎任何提示或技巧!

更新 2:

描述我需要完成的工作的更好方法是在 .groupby(['id','atr1','atr2']) 中创建一个新的指标列,其中以下内容符合组内记录的条件:

A better way to describe what I need to accomplish is that in a .groupby(['id','atr1','atr2']) to create a new indicator column where the following criteria are met for records within the groups:

(df['orig_date'] < df['fix_date'])

推荐答案

我认为这应该可行:

df['failed_part_ind'] = df.apply(lambda row: 1 if ((row['id'] == row['id']) &
                                                (row['atr1'] == row['atr1']) &
                                                (row['atr2'] == row['atr2']) &
                                                (row['orig_date'] < row['fix_date']))
                                            else 0, axis=1)

更新:我想这就是你想要的:

import numpy as np
def f(g):
    min_fix_date = g['fix_date'].min()
    if np.isnan(min_fix_date):
        g['failed_part_ind'] = 0
    else:
        g['failed_part_ind'] = g['orig_date'].apply(lambda d: 1 if d < min_fix_date else 0)
    return g

df.groupby(['id', 'atr1', 'atr2']).apply(lambda g: f(g))

这篇关于带有布尔掩码的 Python Groupby的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-05 00:33