问题描述
我有一个数据框,用于将值形成一个文件,通过该文件,我已按两列进行分组,这些列返回汇总的计数.现在,我想按最大计数值进行排序,但是出现以下错误:
I have a dataframe for values form a file by which I have grouped by two columns, which return a count of the aggregation. Now I want to sort by the max count value, however I get the following error:
通过agg count列查看group是某种索引,因此不确定如何执行此操作,我是Python和Panda的初学者.这是实际的代码,如果您需要更多详细信息,请告诉我:
Looks the group by agg count column is some sort of index so not sure how to do this, I'm a beginner to Python and Panda.Here's the actual code, please let me know if you need more detail:
def answer_five():
df = census_df#.set_index(['STNAME'])
df = df[df['SUMLEV'] == 50]
df = df[['STNAME','CTYNAME']].groupby(['STNAME']).agg(['count']).sort(['count'])
#df.set_index(['count'])
print(df.index)
# get sorted count max item
return df.head(5)
推荐答案
我认为您需要添加reset_index
,然后将参数ascending=False
添加到 sort_values
,因为sort
返回:
I think you need add reset_index
, then parameter ascending=False
to sort_values
because sort
return:
df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'] \
.count() \
.reset_index(name='count') \
.sort_values(['count'], ascending=False) \
.head(5)
示例:
df = pd.DataFrame({'STNAME':list('abscscbcdbcsscae'),
'CTYNAME':[4,5,6,5,6,2,3,4,5,6,4,5,4,3,6,5]})
print (df)
CTYNAME STNAME
0 4 a
1 5 b
2 6 s
3 5 c
4 6 s
5 2 c
6 3 b
7 4 c
8 5 d
9 6 b
10 4 c
11 5 s
12 4 s
13 3 c
14 6 a
15 5 e
df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'] \
.count() \
.reset_index(name='count') \
.sort_values(['count'], ascending=False) \
.head(5)
print (df)
STNAME count
2 c 5
5 s 4
1 b 3
0 a 2
3 d 1
但是似乎您需要 Series.nlargest
:
But it seems you need Series.nlargest
:
df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'].count().nlargest(5)
或:
df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'].size().nlargest(5)
size
counts NaN
values, count
does not.
示例:
df = pd.DataFrame({'STNAME':list('abscscbcdbcsscae'),
'CTYNAME':[4,5,6,5,6,2,3,4,5,6,4,5,4,3,6,5]})
print (df)
CTYNAME STNAME
0 4 a
1 5 b
2 6 s
3 5 c
4 6 s
5 2 c
6 3 b
7 4 c
8 5 d
9 6 b
10 4 c
11 5 s
12 4 s
13 3 c
14 6 a
15 5 e
df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME']
.size()
.nlargest(5)
.reset_index(name='top5')
print (df)
STNAME top5
0 c 5
1 s 4
2 b 3
3 a 2
4 d 1
这篇关于用 pandas 计数和排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!