问题描述
我有以下数据框:
mydf = pandas.DataFrame({"cat": ["first", "first", "first", "second", "second", "third"], "class": ["A", "A", "A", "B", "B", "C"], "name": ["a1", "a2", "a3", "b1", "b2", "c1"], "val": [1,5,1,1,2,10]})
我想创建一个数据框,以对具有相同class
id的项的val
列进行摘要统计.为此,我使用groupby
如下:
I want to create a dataframe that makes summary statistics about the val
column of items with the same class
id. For this I use groupby
as follows:
mydf.groupby("class").val.sum()
这是正确的行为,但我想在生成的df中保留cat
列信息.可以做到吗?以后必须要merge/join
该信息吗?我试过了:
that's the correct behavior, but I'd like to retain the cat
column information in the resulting df. can that be done? do I have to merge/join
that info in later? I tried:
mydf.groupby(["cat", "class"]).val.sum()
但是使用分层索引.我想返回一个普通的数据框,该数据框仅对每个组具有cat
值,其中group by为class
.输出应为具有cat和class值的数据帧(非序列),其中val
条目是对具有相同class
:
but this uses hierarchical indexing. I'd like to have a plain dataframe back that just has the cat
value for each group, where the group by is class
. The output should be a dataframe (not series) with the values of cat and class, where the val
entries are summed over each entry that has the same class
:
cat class val
first A 7
second B 3
third C 10
这可能吗?
推荐答案
使用reset_index
In [9]: mydf.groupby(['cat', "class"]).val.sum().reset_index()
Out[9]:
cat class val
0 first A 7
1 second B 3
2 third C 10
编辑
如果要将cat
设置为索引,则set level = 1
EDIT
set level=1 if you want to set cat
as index
In [10]: mydf.groupby(['cat', "class"]).val.sum().reset_index(level=1)
Out[10]:
class val
cat
first A 7
second B 3
third C 10
您还可以设置as_index=False
以获得相同的输出
You can also set as_index=False
to get the same output
In [29]: mydf.groupby(['cat', "class"], as_index=False).val.sum()
Out[29]:
cat class val
0 first A 7
1 second B 3
2 third C 10
这篇关于按两列(或更多列)将pandas dataframe分组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!