问题描述
我有一个这样的数据框.
I have a data frame like this.
mydf = pd.DataFrame({'a':[1,1,3,3],'b':[np.nan,2,3,6],'c':[1,3,3,9]})
a b c
0 1 NaN 1
1 1 2.0 3
2 3 3.0 3
3 3 6.0 9
我想要这样的结果数据框.
I would like to have a resulting dataframe like this.
myResults = pd.concat([mydf.groupby('a').apply(lambda x: (x.b/x.c).max()), mydf.groupby('a').apply(lambda x: (x.b/x.c).min())], axis =1)
myResults.columns = ['max','min']
max min
a
1 0.666667 0.666667
3 1.000000 0.666667
基本上我希望每个组的 column b
和 column c
的最大和最小比率(按 column a
分组)
Basically i would like to have max and min of ratio of column b
and column c
for each group (grouped by column a
)
是否可以通过agg
来实现?我试过 mydf.groupby('a').agg([lambda x: (x.b/x.c).max(), lambda x: (x.b/x.c).min()])
.它不起作用,并且似乎无法识别列名 b
和 c
.
If it possible to achieve this by agg
?I tried mydf.groupby('a').agg([lambda x: (x.b/x.c).max(), lambda x: (x.b/x.c).min()])
. It will not work, and seems column name b
and c
will not be recognized.
我能想到的另一种方法是先将比率列添加到 mydf
.即 mydf['ratio'] = mydf.b/mydf.c
,然后在更新的 mydf
上使用 agg
就像 mydf.groupby('a')['ratio'],agg[max,min]
.
Another way i can think of is to add the ratio column first to mydf
. i.e. mydf['ratio'] = mydf.b/mydf.c
, and then use agg
on the updated mydf
like mydf.groupby('a')['ratio'],agg[max,min]
.
有没有更好的方法通过 agg 或其他函数来实现这一点?总之,我想将自定义函数应用于分组的DataFrame,并且自定义函数需要从原始DataFrame中读取多列.
Is there a better way to achieve this through agg or other function? In summary, I would like to apply customized function to grouped DataFrame, and the customized function needs to read multiple columns from original DataFrame.
推荐答案
您可以使用自定义函数来实现这一点.
You can use a customized function to acheive this.
您可以使用以下函数使用任何输入列创建任意数量的新列.
You can create any number of new columns using any input columns using the below function.
def f(x):
t = {}
t['max'] = (x['b']/x['c']).max()
t['min'] = (x['b']/x['c']).min()
return pd.Series(t)
mydf.groupby('a').apply(f)
输出:
max min
a
1 0.666667 0.666667
3 1.000000 0.666667
这篇关于python:使用具有多个自定义功能的 agg的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!