问题描述
我有一个看起来像这样的系列:
I have a series that looks like this:
delivery
2007-04-26 706 23
2007-04-27 705 10
706 1089
708 83
710 13
712 51
802 4
806 1
812 3
2007-04-29 706 39
708 4
712 1
2007-04-30 705 3
706 1016
707 2
...
2014-11-04 1412 53
1501 1
1502 1
1512 1
2014-11-05 1411 47
1412 1334
1501 40
1502 433
1504 126
1506 100
1508 7
1510 6
1512 51
1604 1
1612 5
Length: 26255, dtype: int64
查询在哪里:df.groupby([df.index.date, 'delivery']).size()
对于每一天,我都需要提取数量最多的交货编号.我觉得它会是这样的:
For each day, I need to pull out the delivery number which has the most volume. I feel like it would be something like:
df.groupby([df.index.date, 'delivery']).size().idxmax(axis=1)
然而,这只是返回整个数据帧的 idxmax;相反,我需要每一天的二级 idmax(不是日期,而是交付数量),而不是整个数据帧(即它返回一个向量).
However, this just returns me the idxmax for the entire dataframe; instead, I need the second-level idmax (not the date but rather the delivery number) for each day, not the entire dataframe (ie. it returns a vector).
关于如何实现这一点的任何想法?
Any ideas on how to accomplish this?
推荐答案
您的示例代码不起作用,因为 idxmax 在 groupby 操作之后执行(因此在整个数据帧上)
Your example code doesn't work because the idxmax is executed after the groupby operation (so on the whole dataframe)
我不确定如何在多级索引上使用 idxmax,所以这里有一个简单的解决方法.
I'm not sure how to use idxmax on multilevel indexes, so here's a simple workaround.
设置数据:
import pandas as pd
d= {'Date': ['2007-04-26', '2007-04-27', '2007-04-27', '2007-04-27',
'2007-04-27', '2007-04-28', '2007-04-28'],
'DeliveryNb': [706, 705, 708, 450, 283, 45, 89],
'DeliveryCount': [23, 10, 1089, 82, 34, 100, 11]}
df = pd.DataFrame.from_dict(d, orient='columns').set_index('Date')
print df
输出
DeliveryCount DeliveryNb
Date
2007-04-26 23 706
2007-04-27 10 705
2007-04-27 1089 708
2007-04-27 82 450
2007-04-27 34 283
2007-04-28 100 45
2007-04-28 11 89
创建自定义函数:
诀窍是使用 reset_index() 方法(这样你很容易得到组的整数索引)
The trick is to use the reset_index() method (so you easily get the integer index of the group)
def func(df):
idx = df.reset_index()['DeliveryCount'].idxmax()
return df['DeliveryNb'].iloc[idx]
应用:
g = df.groupby(df.index)
g.apply(func)
结果:
Date
2007-04-26 706
2007-04-27 708
2007-04-28 45
dtype: int64
这篇关于数据帧中多个索引的Python pandas idxmax的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!