对同一列进行多个聚合

对同一列进行多个聚合

本文介绍了使用 pandas GroupBy.agg() 对同一列进行多个聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有 Pandas 内置方法可以将两个不同的聚合函数 f1, f2 应用于同一列 df[returns"],而无需调用agg() 多次?

Is there a pandas built-in way to apply two different aggregating functions f1, f2 to the same column df["returns"], without having to call agg() multiple times?

示例数据框:

import pandas as pd
import datetime as dt
import numpy as np

pd.np.random.seed(0)
df = pd.DataFrame({
         "date"    :  [dt.date(2012, x, 1) for x in range(1, 11)],
         "returns" :  0.05 * np.random.randn(10),
         "dummy"   :  np.repeat(1, 10)
})

语法错误但直觉上正确的方法是:

The syntactically wrong, but intuitively right, way to do it would be:

# Assume `f1` and `f2` are defined for aggregating.
df.groupby("dummy").agg({"returns": f1, "returns": f2})

显然,Python 不允许重复键.有没有其他方式来表达 agg() 的输入?也许元组列表 [(column, function)] 会更好地工作,以允许将多个函数应用于同一列?但是 agg() 似乎只接受字典.

Obviously, Python doesn't allow duplicate keys. Is there any other manner for expressing the input to agg()? Perhaps a list of tuples [(column, function)] would work better, to allow multiple functions applied to the same column? But agg() seems like it only accepts a dictionary.

除了定义一个仅应用其中的两个函数的辅助函数之外,还有其他解决方法吗?(无论如何,这将如何与聚合一起工作?)

Is there a workaround for this besides defining an auxiliary function that just applies both of the functions inside of it? (How would this work with aggregation anyway?)

推荐答案

您可以简单地将函数作为列表传递:

You can simply pass the functions as a list:

In [20]: df.groupby("dummy").agg({"returns": [np.mean, np.sum]})
Out[20]:
           mean       sum
dummy
1      0.036901  0.369012

或作为字典:

In [21]: df.groupby('dummy').agg({'returns':
                                  {'Mean': np.mean, 'Sum': np.sum}})
Out[21]:
        returns
           Mean       Sum
dummy
1      0.036901  0.369012

这篇关于使用 pandas GroupBy.agg() 对同一列进行多个聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-25 02:36