问题描述
试图生成具有多个值"列的数据透视表.我知道我可以使用aggfunc以我想要的方式聚合值,但是如果我不想对两列进行求和或平均,但我想要对一列求和而对另一列求平均值,该怎么办.那么可以使用熊猫吗?
Was trying to generate a pivot table with multiple "values" columns. I know I can use aggfunc to aggregate values the way I want to, but what if I don't want to sum or avg both columns but instead I want sum of one column while mean of the other one. So is it possible to do so using pandas?
df = pd.DataFrame({
'A' : ['one', 'one', 'two', 'three'] * 6,
'B' : ['A', 'B', 'C'] * 8,
'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4,
'D' : np.random.randn(24),
'E' : np.random.randn(24)
})
现在,这将获得一个带有总和的数据透视表:
Now this will get a pivot table with sum:
pd.pivot_table(df, values=['D','E'], rows=['B'], aggfunc=np.sum)
这是故意的:
pd.pivot_table(df, values=['D','E'], rows=['B'], aggfunc=np.mean)
如何获取D
的总和和E
的均值?
How can I get sum for D
and mean for E
?
希望我的问题很清楚.
推荐答案
您可以合并两个DataFrames :
>>> df1 = pd.pivot_table(df, values=['D'], rows=['B'], aggfunc=np.sum)
>>> df2 = pd.pivot_table(df, values=['E'], rows=['B'], aggfunc=np.mean)
>>> pd.concat((df1, df2), axis=1)
D E
B
A 1.810847 -0.524178
B 2.762190 -0.443031
C 0.867519 0.078460
或者您可以传递功能列表作为aggfunc
参数,然后重新编制索引:
or you can pass list of functions as aggfunc
parameter and then reindex:
>>> df3 = pd.pivot_table(df, values=['D','E'], rows=['B'], aggfunc=[np.sum, np.mean])
>>> df3
sum mean
D E D E
B
A 1.810847 -4.193425 0.226356 -0.524178
B 2.762190 -3.544245 0.345274 -0.443031
C 0.867519 0.627677 0.108440 0.078460
>>> df3 = df3.ix[:, [('sum', 'D'), ('mean','E')]]
>>> df3.columns = ['D', 'E']
>>> df3
D E
B
A 1.810847 -0.524178
B 2.762190 -0.443031
C 0.867519 0.078460
Alghouth,最好有一个选项为每个列分别定义aggfunc
.不知道如何实现,可能会传递给aggfunc
像dict这样的参数,例如{'D':np.mean, 'E':np.sum}
.
Alghouth, it would be nice to have an option to defin aggfunc
for each column individually. Don't know how it could be done, may be pass into aggfunc
dict-like parameter, like {'D':np.mean, 'E':np.sum}
.
更新实际上,在您的情况下,您可以手动旋转:
update Actually, in your case you can pivot by hand:
>>> df.groupby('B').aggregate({'D':np.sum, 'E':np.mean})
E D
B
A -0.524178 1.810847
B -0.443031 2.762190
C 0.078460 0.867519
这篇关于为pandas数据透视表中的每个值列定义aggfunc的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!