问题描述
这是一个后续问题对于这个问题 - 对于输入数据帧具有多索引而不是常规索引的情况.
This is a follow up question for this one - for the case where input dataframe has a multi-index, rather than a regular index.
我想执行一些复杂的操作 foo
接受单索引 df 并在所有 dfs 上执行它,这些 dfs 是通过查看 2 的级别 0 获得的行-level-index,逐行.
I would like to perform some complicated operation foo
that takes in a single-index-df and perform it on all the dfs that are the rows that are obtained by looking at level 0 of the 2-level-index, row by row.
从链接的问题中获取相同的输入并增加一个多索引.
take the same input from the linked question augmented to also have a multi index.
i0 i1 0 1 2
0 0 0 "5" a
1 1 "4" b
1 2 2 "3" c
3 3 "2" d
4 4 "1" e
5 5 "0" f
所以我想对
i1 0 1 2
0 0 "5" a
1 1 "4" b
然后
i1 0 1 2
2 2 "3" c
3 3 "2" d
4 4 "1" e
5 5 "0" f
获取另一个数据帧,其列取决于 foo
返回的内容.
to obtain another dataframe, whose columns depend on what foo
returns.
对于一个函数 foo
就像参考问题中的那个,
For the a function foo
like the one in the referenced question,
foo
会返回
i0 i1 res
0 0 "05,24"
1 "05,24"
1 2 "43,62"
3 "43,62"
4 "81,100"
5 "81,100"
我的尝试:
My attempt:
def row_reduce(col0, col1):
return str(2 * col0) + str(col1)
def col_reduce(rows_data):
return ",".join(rows_data)
def foo(df):
res = (df.apply(lambda x: row_reduce(x[0], x[1]), axis=1)
.groupby(df.index // 2)
.transform(col_reduce))
return res
def _perform_operation_on_all_main_ind(df: pd.DataFrame, op: Callable[[pd.DataFrame], pd.DataFrame]):
return df.groupby(level=0).apply(op)
_perform_operation_on_all_main_ind(df, foo)
然后给我 TypeError: cannot perform __floordiv__ with this index type: MultiIndex
,这意味着多索引没有减少到单个索引.
Which then gives me TypeError: cannot perform __floordiv__ with this index type: MultiIndex
, meaning the multi index didn't get reduced to a single index.
也许我把它倒过来了,你可以告诉我正确的方向.
Maybe I have it backwards and you can show me the correct direction.
推荐答案
您可以更改 foo
函数以通过 DataFrame
的长度进行整数除法,最后添加 group_keys=False
避免重复第一级 MultiIndex
:
You can change foo
function for integer division by helper array by length of DataFrame
, last add group_keys=False
for avoid duplcicated first level of MultiIndex
:
def foo(df):
res = (df.apply(lambda x: row_reduce(x[0], x[1]), axis=1)
.groupby(np.arange(len(df)) // 2)
.transform(col_reduce))
return res
df = df.groupby(level=0, group_keys=False).apply(foo)
print (df)
i0 i1
0 0 05,24
1 05,24
1 2 43,62
3 43,62
4 81,100
5 81,100
dtype: object
这篇关于如何对多级索引数据帧的所有行执行复杂的 df 操作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!