本文介绍了数据帧中多个索引的Python pandas idxmax的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的系列:

I have a series that looks like this:

            delivery
2007-04-26  706           23
2007-04-27  705           10
            706         1089
            708           83
            710           13
            712           51
            802            4
            806            1
            812            3
2007-04-29  706           39
            708            4
            712            1
2007-04-30  705            3
            706         1016
            707            2
...
2014-11-04  1412          53
            1501           1
            1502           1
            1512           1
2014-11-05  1411          47
            1412        1334
            1501          40
            1502         433
            1504         126
            1506         100
            1508           7
            1510           6
            1512          51
            1604           1
            1612           5
Length: 26255, dtype: int64

查询在哪里:df.groupby([df.index.date, 'delivery']).size()

对于每一天,我都需要提取数量最多的交货编号.我觉得它会是这样的:

For each day, I need to pull out the delivery number which has the most volume. I feel like it would be something like:

df.groupby([df.index.date, 'delivery']).size().idxmax(axis=1)

然而,这只是返回整个数据帧的 idxmax;相反,我需要每一天的二级 idmax(不是日期,而是交付数量),而不是整个数据帧(即它返回一个向量).

However, this just returns me the idxmax for the entire dataframe; instead, I need the second-level idmax (not the date but rather the delivery number) for each day, not the entire dataframe (ie. it returns a vector).

关于如何实现这一点的任何想法?

Any ideas on how to accomplish this?

推荐答案

您的示例代码不起作用,因为 idxmax 在 groupby 操作之后执行(因此在整个数据帧上)

Your example code doesn't work because the idxmax is executed after the groupby operation (so on the whole dataframe)

我不确定如何在多级索引上使用 idxmax,所以这里有一个简单的解决方法.

I'm not sure how to use idxmax on multilevel indexes, so here's a simple workaround.

设置数据:

import pandas as pd
d= {'Date': ['2007-04-26', '2007-04-27', '2007-04-27', '2007-04-27',
             '2007-04-27', '2007-04-28', '2007-04-28'],
        'DeliveryNb': [706, 705, 708, 450, 283, 45, 89],
        'DeliveryCount': [23, 10, 1089, 82, 34, 100, 11]}

df = pd.DataFrame.from_dict(d, orient='columns').set_index('Date')
print df

输出

            DeliveryCount  DeliveryNb
Date
2007-04-26             23         706
2007-04-27             10         705
2007-04-27           1089         708
2007-04-27             82         450
2007-04-27             34         283
2007-04-28            100          45
2007-04-28             11          89

创建自定义函数:

诀窍是使用 reset_index() 方法(这样你很容易得到组的整数索引)

The trick is to use the reset_index() method (so you easily get the integer index of the group)

def func(df):
    idx = df.reset_index()['DeliveryCount'].idxmax()
    return df['DeliveryNb'].iloc[idx]

应用:

g = df.groupby(df.index)
g.apply(func)

结果:

Date
2007-04-26    706
2007-04-27    708
2007-04-28     45
dtype: int64

这篇关于数据帧中多个索引的Python pandas idxmax的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 17:43