我有一个日志数据的数据框,如下所示:

date    url users
0   2019-09-12  http://example.com/?url=001 45
1   2019-09-12  http://example.com/?url=002 12
2   2019-09-12  http://example.com/?url=003 17
3   2019-09-12  http://example.com/?url=004 87
4   2019-09-12  http://example.com/?url=005 4


我需要每天提取访问者的前三个网址。

如果我这样做:

df.groupby(['date'])['users'].nlargest(3)


我几乎得到了我想要的东西:

2019-09-12  183    88
            132    62
            49     41
2019-09-13  275    95
            336    65
            206    18


但是,除了数字183、132等,我还需要这样的网址:

2019-09-12  http://example.com/?url=001    88
            http://example.com/?url=002    62
            http://example.com/?url=003    41
2019-09-13  http://example.com/?url=004    95
            http://example.com/?url=002    65
            http://example.com/?url=001    18


如果我这样添加网址:

df.groupby(['date','url'])['users'].nlargest(3)


我完全失去了url-info。我怎样才能解决这个问题?

最佳答案

只添加DataFrame.set_index

df = df.set_index('url').groupby(['date'])['users'].nlargest(3)
print (df)
Int64Index([0, 1, 2, 3, 4], dtype='int64')
date        url
2019-09-12  http://example.com/?url=004    87
            http://example.com/?url=001    45
            http://example.com/?url=003    17
Name: users, dtype: int64


或将DataFrame.sort_valuesascending=[True, False]GroupBy.head一起使用:

df = df.sort_values(['date', 'users'], ascending=[True, False]).groupby('date').head(3)


测试更改的数据:

print (df)
         date                          url  users
0  2019-09-12  http://example.com/?url=001     45
1  2019-09-12  http://example.com/?url=002     12
2  2019-09-13  http://example.com/?url=003     17
3  2019-09-13  http://example.com/?url=004     87
4  2019-09-13  http://example.com/?url=005      4

df1 = df.set_index('url').groupby(['date'])['users'].nlargest(3)
print (df1)
date        url
2019-09-12  http://example.com/?url=001    45
            http://example.com/?url=002    12
2019-09-13  http://example.com/?url=004    87
            http://example.com/?url=003    17
            http://example.com/?url=005     4
Name: users, dtype: int64

df2 = df.sort_values(['date', 'users'], ascending=[True, False]).groupby('date').head(3)
print (df2)
         date                          url  users
0  2019-09-12  http://example.com/?url=001     45
1  2019-09-12  http://example.com/?url=002     12
3  2019-09-13  http://example.com/?url=004     87
2  2019-09-13  http://example.com/?url=003     17
4  2019-09-13  http://example.com/?url=005      4

关于python - 获取 Pandas 的名字groupby,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/57969757/

10-14 17:47
查看更多