本文介绍了在大 pandas 数据框中提取具有最大值的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在这里问类似的问题: Python:使用groupby

A similar question is asked here:Python : Getting the Row which has the max value in groups using groupby

但是,即使该组中有多个具有最大值的记录,我也只需要每组一个记录.

However, I just need one record per group even if there are more than one record with maximum value in that group.

在下面的示例中,我需要一条记录用于"s2".对我来说,哪一个都没关系.

In the example below, I need one record for "s2". For me it doesn't matter which one.

>>> df = DataFrame({'Sp':['a','b','c','d','e','f'], 'Mt':['s1', 's1', 's2','s2','s2','s3'], 'Value':[1,2,3,4,5,6], 'count':[3,2,5,10,10,6]})
>>> df
   Mt Sp  Value  count
0  s1  a      1      3
1  s1  b      2      2
2  s2  c      3      5
3  s2  d      4     10
4  s2  e      5     10
5  s3  f      6      6
>>> idx = df.groupby(['Mt'])['count'].transform(max) == df['count']
>>> df[idx]
   Mt Sp  Value  count
0  s1  a      1      3
3  s2  d      4     10
4  s2  e      5     10
5  s3  f      6      6
>>> 

推荐答案

您可以使用first

In [14]: df.groupby('Mt').first()
Out[14]: 
   Sp  Value  count
Mt                 
s1  a      1      3
s2  c      3      5
s3  f      6      6

更新

设置as_index=False即可实现目标

In [28]: df.groupby('Mt', as_index=False).first()
Out[28]: 
   Mt Sp  Value  count
0  s1  a      1      3
1  s2  c      3      5
2  s3  f      6      6 

再次更新

很抱歉误解了您的意思.如果您要在组中拥有最大数量的书,可以先对其进行排序

Update Again

Sorry for misunderstanding what you mean. You can sort it first if you want the one with max count in a group

In [196]: df.sort('count', ascending=False).groupby('Mt', as_index=False).first()
Out[196]: 
   Mt Sp  Value  count
0  s1  a      1      3
1  s2  e      5     10
2  s3  f      6      6

这篇关于在大 pandas 数据框中提取具有最大值的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-22 23:57