本文介绍了Python PANDAS:GroupBy First Transform 创建指标的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下格式的熊猫数据框:

I have a pandas dataframe in the following format:

id,criteria_1,criteria_2,criteria_3,criteria_4,criteria_5,criteria_6
1,0,0,95,179,1,1
1,0,0,97,185,NaN,1
1,1,2,92,120,1,1
2,0,0,27,0,1,NaN
2,1,2,90,179,1,1
2,2,5,111,200,1,1
3,1,2,91,175,1,1
3,0,8,90,27,NaN,NaN
3,0,0,22,0,NaN,NaN

我有以下工作代码:

df_final = df[((df['criteria_1'] >=1.0) | (df['criteria_2'] >=2.0)) &
               (df['criteria_3'] >=90.0) &
               (df['criteria_4'] <=180.0) &
              ((df['criteria_5'].notnull()) & (df['criteria_6'].notnull()))].groupby('id').first()

这是什么结果:

id,criteria_1,criteria_2,criteria_3,criteria_4,criteria_5,criteria_6
1,1,2,92,120,1,1
2,1,2,90,179,1,1
3,1,2,91,175,1,1

但是,我想创建一个新的布尔指标标志列,以使用 .transform() 在原始数据帧上指示哪些行符合条件(上述 groupby 的结果).

However, I would like to create a new Boolean indicator flag column to indicate which rows meet the criteria (result of above groupby) on the original dataframe using .transform().

最初,我认为我可以使用 .first().transform('any').astype(int) 的组合,但我认为这行不通.如果有更干净的方法来做到这一点,那也会很棒.

Originally, I thought I could use a combination of .first().transform('any').astype(int), but I don't think that will work. If there is cleaner way to do this that would be great as well.

推荐答案

这是一种方法:

mask = (((df['criteria_1'] >=1.0) | (df['criteria_2'] >=2.0)) &
         (df['criteria_3'] >=90.0) &
         (df['criteria_4'] <=180.0) &
         ((df['criteria_5'].notnull()) & (df['criteria_6'].notnull())))

# reset_index() defaults to drop=False. It inserts the old index into the DF
# as a new column named 'index'.
idx = df.reset_index()[mask].groupby('id').first().reset_index(drop=True)['index']

df['flag'] = df.index.isin(idx).astype(int)

这篇关于Python PANDAS:GroupBy First Transform 创建指标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 19:42