如何对多级索引数据帧的所有行执行复杂的 df 操作?

本文介绍了如何对多级索引数据帧的所有行执行复杂的 df 操作?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是一个后续问题对于这个问题 - 对于输入数据帧具有多索引而不是常规索引的情况.

This is a follow up question for this one - for the case where input dataframe has a multi-index, rather than a regular index.

我想执行一些复杂的操作 foo 接受单索引 df 并在所有 dfs 上执行它，这些 dfs 是通过查看 2 的级别 0 获得的行-level-index，逐行.

I would like to perform some complicated operation foo that takes in a single-index-df and perform it on all the dfs that are the rows that are obtained by looking at level 0 of the 2-level-index, row by row.

从链接的问题中获取相同的输入并增加一个多索引.

take the same input from the linked question augmented to also have a multi index.

i0 i1   0 1   2 
0  0    0 "5" a 
   1    1 "4" b
1  2    2 "3" c
   3    3 "2" d 
   4    4 "1" e 
   5    5 "0" f

所以我想对

i1     0 1   2
 0      0 "5" a
 1      1 "4" b

然后

i1     0 1   2
 2     2 "3" c
 3     3 "2" d
 4     4 "1" e
 5     5 "0" f

获取另一个数据帧，其列取决于 foo 返回的内容.

to obtain another dataframe, whose columns depend on what foo returns.

对于一个函数 foo 就像参考问题中的那个，

For the a function foo like the one in the referenced question,

foo 会返回

i0 i1 res
0  0  "05,24"
   1  "05,24"
1  2  "43,62"
   3  "43,62"
   4  "81,100"
   5  "81,100"

我的尝试:

My attempt:

def row_reduce(col0, col1):
    return str(2 * col0) + str(col1)

def col_reduce(rows_data):
    return ",".join(rows_data)

def foo(df):
    res = (df.apply(lambda x: row_reduce(x[0], x[1]), axis=1)
                   .groupby(df.index // 2)
                   .transform(col_reduce))
    return res


def _perform_operation_on_all_main_ind(df: pd.DataFrame, op: Callable[[pd.DataFrame], pd.DataFrame]):
    return df.groupby(level=0).apply(op)

_perform_operation_on_all_main_ind(df, foo)

然后给我 TypeError: cannot perform __floordiv__ with this index type: MultiIndex，这意味着多索引没有减少到单个索引.

Which then gives me TypeError: cannot perform __floordiv__ with this index type: MultiIndex, meaning the multi index didn't get reduced to a single index.

也许我把它倒过来了，你可以告诉我正确的方向.

Maybe I have it backwards and you can show me the correct direction.

推荐答案

您可以更改 foo 函数以通过 DataFrame 的长度进行整数除法，最后添加 group_keys=False 避免重复第一级 MultiIndex:

You can change foo function for integer division by helper array by length of DataFrame, last add group_keys=False for avoid duplcicated first level of MultiIndex:

def foo(df):
    res = (df.apply(lambda x: row_reduce(x[0], x[1]), axis=1)
                   .groupby(np.arange(len(df)) // 2)
                   .transform(col_reduce))
    return res


df = df.groupby(level=0, group_keys=False).apply(foo)
print (df)
i0  i1
0   0      05,24
    1      05,24
1   2      43,62
    3      43,62
    4     81,100
    5     81,100
dtype: object

这篇关于如何对多级索引数据帧的所有行执行复杂的 df 操作?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！