通过对数据帧的操作进行分组

通过对数据帧的操作进行分组

本文介绍了 pandas 通过对数据帧的操作进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像下面这样的熊猫数据框.

I have a pandas data frame like the one below.

UsrId   JobNos
 1       4
 1       56
 2       23
 2       55
 2       41
 2       5
 3       78
 1       25
 3       1

我根据UsrId对数据帧进行分组.分组后的数据框在概念上将如下所示.

I group by the data frame based on the UsrId. The grouped data frame will conceptually look like below.

UsrId   JobNos
  1    [4,56,25]
  2    [23,55,41,5]
  3    [78,1]

现在,我正在寻找一个内置API,该API可为我提供具有最大作业数量的UsrId.对于上面的示例,UsrId-2具有最大计数.

Now, I'm looking for an in-build API that will give me the UsrId with the maximum job count. For the above example, UsrId-2 has the maximum count.

更新:而不是具有最大作业数的UsrID,我想要具有最大作业数的'n'UserId.对于上面的示例,如果n = 2,则输出为[2,1].能做到吗?

UPDATE:Instead of the UsrID with maximum job count, I want 'n' UserIds with maximum job counts. For the above example, if n=2 then the output is [2,1]. Can this be done?

推荐答案

类似df.groupby('UsrId').JobNos.sum().idxmax()的事情应该做到:

In [1]: import pandas as pd

In [2]: from StringIO import StringIO

In [3]: data = """UsrId   JobNos
   ...:  1       4
   ...:  1       56
   ...:  2       23
   ...:  2       55
   ...:  2       41
   ...:  2       5
   ...:  3       78
   ...:  1       25
   ...:  3       1"""

In [4]: df = pd.read_csv(StringIO(data), sep='\s+')

In [5]: grouped = df.groupby('UsrId')

In [6]: grouped.JobNos.sum()
Out[6]:
UsrId
1         85
2        124
3         79
Name: JobNos

In [7]: grouped.JobNos.sum().idxmax()
Out[7]: 2

如果您希望基于每个组中的项目数获得结果:

If you want your results based on the number of items in each group:

In [8]: grouped.size()
Out[8]:
UsrId
1        3
2        4
3        2

In [9]: grouped.size().idxmax()
Out[9]: 2

更新:要获取有序结果,可以使用.order方法:

Update: To get ordered results you can use the .order method:

In [10]: grouped.JobNos.sum().order(ascending=False)
Out[10]:
UsrId
2        124
1         85
3         79
Name: JobNos

这篇关于 pandas 通过对数据帧的操作进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-19 22:08