我有一个日志数据的数据框,如下所示:
date url users
0 2019-09-12 http://example.com/?url=001 45
1 2019-09-12 http://example.com/?url=002 12
2 2019-09-12 http://example.com/?url=003 17
3 2019-09-12 http://example.com/?url=004 87
4 2019-09-12 http://example.com/?url=005 4
我需要每天提取访问者的前三个网址。
如果我这样做:
df.groupby(['date'])['users'].nlargest(3)
我几乎得到了我想要的东西:
2019-09-12 183 88
132 62
49 41
2019-09-13 275 95
336 65
206 18
但是,除了数字183、132等,我还需要这样的网址:
2019-09-12 http://example.com/?url=001 88
http://example.com/?url=002 62
http://example.com/?url=003 41
2019-09-13 http://example.com/?url=004 95
http://example.com/?url=002 65
http://example.com/?url=001 18
如果我这样添加网址:
df.groupby(['date','url'])['users'].nlargest(3)
我完全失去了url-info。我怎样才能解决这个问题?
最佳答案
df = df.set_index('url').groupby(['date'])['users'].nlargest(3)
print (df)
Int64Index([0, 1, 2, 3, 4], dtype='int64')
date url
2019-09-12 http://example.com/?url=004 87
http://example.com/?url=001 45
http://example.com/?url=003 17
Name: users, dtype: int64
或将
DataFrame.sort_values
与ascending=[True, False]
和GroupBy.head
一起使用:df = df.sort_values(['date', 'users'], ascending=[True, False]).groupby('date').head(3)
测试更改的数据:
print (df)
date url users
0 2019-09-12 http://example.com/?url=001 45
1 2019-09-12 http://example.com/?url=002 12
2 2019-09-13 http://example.com/?url=003 17
3 2019-09-13 http://example.com/?url=004 87
4 2019-09-13 http://example.com/?url=005 4
df1 = df.set_index('url').groupby(['date'])['users'].nlargest(3)
print (df1)
date url
2019-09-12 http://example.com/?url=001 45
http://example.com/?url=002 12
2019-09-13 http://example.com/?url=004 87
http://example.com/?url=003 17
http://example.com/?url=005 4
Name: users, dtype: int64
df2 = df.sort_values(['date', 'users'], ascending=[True, False]).groupby('date').head(3)
print (df2)
date url users
0 2019-09-12 http://example.com/?url=001 45
1 2019-09-12 http://example.com/?url=002 12
3 2019-09-13 http://example.com/?url=004 87
2 2019-09-13 http://example.com/?url=003 17
4 2019-09-13 http://example.com/?url=005 4
关于python - 获取 Pandas 的名字groupby,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/57969757/