本文介绍了使用* not *分组的标签,以大 pandas 分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我写了一个代码,求和之类的代码
I have written a code that evaluates sums like
\sum_i a_{i,j}
(读取对i的所有值求和")通过为每个组合i, j
创建一个带有一行的pd.Dataframe
,然后使用groupby进行求和.
(read "sum a over all values of i")by creating a pd.Dataframe
with a row for each combination i, j
and then using groupby to perform the sum.
考虑最小的例子
import pandas as pd
from pandas import Series, DataFrame
import numpy as np
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['i', 'j'])
df = pd.DataFrame(np.random.randn(8, 3), index=index, columns=['A', 'B', 'C'])
从 https://pandas.pydata.org/pandas-docs借来/stable/advanced.html .
总结所有i
我可以做的
df.groupby(level=['j']).sum()
或等效地
df.sum(level=['j'])
这可行,但是由于两个原因,我不喜欢它:
This works, but I don't like it for two reasons:
- 这是不可扩展的.每当有新的静默"索引时,我都需要修改代码不同位置的所有总和.
- 我很难理解.在我的情况下,
i
和j
的含义很明确,因此我想明确总结一下内容以获取自文档代码.
- This is not extendible. Whenever I have a new "silent" index, I need to modify all sum which are in different places of my code.
- I find it hard to understand. In my case
i
andj
have a clear meaning, thus I want to write what I sum over explictly to get self-documenting code.
我能做的是
i = [x for x in df.index.names if x != 'first']
df.sum(level=i)
虽然这解决了第一个问题,但我认为代码不会变得更清晰.
While this solves the first problem, I don't think the code gets any clearer.
我有一些更好的熊猫功能或更合适的(python)工具吗?
I there some better pandas functionality or better suited (python) tool for that?
推荐答案
尝试一下.
df.groupby(df.index.droplevel('i')).sum() # groupby except index 'i'
这篇关于使用* not *分组的标签,以大 pandas 分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!