使用* not *分组的标签，以大 pandas 分组

本文介绍了使用* not *分组的标签，以大 pandas 分组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我写了一个代码，求和之类的代码

I have written a code that evaluates sums like

\sum_i a_{i,j}

(读取对i的所有值求和")通过为每个组合i, j创建一个带有一行的pd.Dataframe，然后使用groupby进行求和.

(read "sum a over all values of i")by creating a pd.Dataframe with a row for each combination i, j and then using groupby to perform the sum.

考虑最小的例子

import pandas as pd
from pandas import Series, DataFrame
import numpy as np

arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']), 
          np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['i', 'j'])

df = pd.DataFrame(np.random.randn(8, 3), index=index, columns=['A', 'B', 'C'])

从 https://pandas.pydata.org/pandas-docs借来/stable/advanced.html .

总结所有i我可以做的

df.groupby(level=['j']).sum()

或等效地

df.sum(level=['j'])

这可行，但是由于两个原因，我不喜欢它:

This works, but I don't like it for two reasons:

这是不可扩展的.每当有新的静默"索引时，我都需要修改代码不同位置的所有总和.
我很难理解.在我的情况下，i和j的含义很明确，因此我想明确总结一下内容以获取自文档代码.

This is not extendible. Whenever I have a new "silent" index, I need to modify all sum which are in different places of my code.
I find it hard to understand. In my case i and j have a clear meaning, thus I want to write what I sum over explictly to get self-documenting code.

我能做的是

i = [x for x in df.index.names if x != 'first']
df.sum(level=i)

虽然这解决了第一个问题，但我认为代码不会变得更清晰.

While this solves the first problem, I don't think the code gets any clearer.

我有一些更好的熊猫功能或更合适的(python)工具吗?

I there some better pandas functionality or better suited (python) tool for that?

分组

使用* not *分组的标签，以大 pandas 分组

问题描述

推荐答案