本文介绍了用 pandas 计数和排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!



I have a dataframe for values form a file by which I have grouped by two columns, which return a count of the aggregation. Now I want to sort by the max count value, however I get the following error:

通过agg count列查看group是某种索引,因此不确定如何执行此操作,我是Python和Panda的初学者.这是实际的代码,如果您需要更多详细信息,请告诉我:

Looks the group by agg count column is some sort of index so not sure how to do this, I'm a beginner to Python and Panda.Here's the actual code, please let me know if you need more detail:

def answer_five():
    df = census_df#.set_index(['STNAME'])
    df = df[df['SUMLEV'] == 50]
    df = df[['STNAME','CTYNAME']].groupby(['STNAME']).agg(['count']).sort(['count'])
    # get sorted count max item
    return df.head(5)


我认为您需要添加reset_index,然后将参数ascending=False添加到 sort_values ,因为sort返回:

I think you need add reset_index, then parameter ascending=False to sort_values because sort return:

df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'] \
                             .count() \
                             .reset_index(name='count') \
                             .sort_values(['count'], ascending=False) \


df = pd.DataFrame({'STNAME':list('abscscbcdbcsscae'),

print (df)
0         4      a
1         5      b
2         6      s
3         5      c
4         6      s
5         2      c
6         3      b
7         4      c
8         5      d
9         6      b
10        4      c
11        5      s
12        4      s
13        3      c
14        6      a
15        5      e

df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'] \
                             .count() \
                             .reset_index(name='count') \
                             .sort_values(['count'], ascending=False) \

print (df)
  STNAME  count
2      c      5
5      s      4
1      b      3
0      a      2
3      d      1

但是似乎您需要 Series.nlargest :

But it seems you need Series.nlargest:

df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'].count().nlargest(5)


df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'].size().nlargest(5)

size 计算NaN值, count 不会.

size counts NaN values, count does not.


df = pd.DataFrame({'STNAME':list('abscscbcdbcsscae'),

print (df)
0         4      a
1         5      b
2         6      s
3         5      c
4         6      s
5         2      c
6         3      b
7         4      c
8         5      d
9         6      b
10        4      c
11        5      s
12        4      s
13        3      c
14        6      a
15        5      e

df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME']
print (df)
  STNAME  top5
0      c     5
1      s     4
2      b     3
3      a     2
4      d     1

这篇关于用 pandas 计数和排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 17:24