我正在使用以下婴儿名字数据集:
https://raw.githubusercontent.com/hadley/data-baby-names/master/baby-names.csv

数据如下所示:

"year","name","percent","sex"
1880,"John",0.081541,"boy"
1880,"William",0.080511,"boy"
1880,"James",0.050057,"boy"
1880,"Charles",0.045167,"boy"
1880,"George",0.043292,"boy"
1880,"Frank",0.02738,"boy"
1880,"Joseph",0.022229,"boy"


我将所有名称归为一组,并对男孩和女孩的百分比求和:

data1.groupby(['name','sex'])[['percent']].sum()


这将创建一个多索引数据框:

   Name    Sex  Percent

   Aaron   boy  0.292292
           girl 0.000805
   Abagail girl 0.001326
   Abbie   boy  0.000092
           girl 0.022804


对于每个名称,我想在新的数据框中返回较高的性别百分比,例如:

   Name    Sex  Percent

   Aaron   boy  0.292292
   Abagail girl 0.001326
   Abbie   girl 0.022804


我一直在浏览multi-index documentation,但无法弄清楚。任何帮助表示赞赏。

最佳答案

您可以在groupby tail之后使用groupby sum

s=df.groupby(['name','sex'])[['percent']].sum()
s.sort_values('Percent').groupby(level=0).tail(1)

关于python - 多索引切片比较,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/50457329/

10-12 16:46