本文介绍了使用* not *分组的标签,以大 pandas 分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个代码,求和之类的代码

I have written a code that evaluates sums like

\sum_i a_{i,j}

(读取对i的所有值求和")通过为每个组合i, j创建一个带有一行的pd.Dataframe,然后使用groupby进行求和.

(read "sum a over all values of i")by creating a pd.Dataframe with a row for each combination i, j and then using groupby to perform the sum.

考虑最小的例子

import pandas as pd
from pandas import Series, DataFrame
import numpy as np

arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']), 
          np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['i', 'j'])

df = pd.DataFrame(np.random.randn(8, 3), index=index, columns=['A', 'B', 'C'])

https://pandas.pydata.org/pandas-docs借来/stable/advanced.html .

总结所有i我可以做的

df.groupby(level=['j']).sum()

或等效地

df.sum(level=['j'])

这可行,但是由于两个原因,我不喜欢它:

This works, but I don't like it for two reasons:

  1. 这是不可扩展的.每当有新的静默"索引时,我都需要修改代码不同位置的所有总和.
  2. 我很难理解.在我的情况下,ij的含义很明确,因此我想明确总结一下内容以获取自文档代码.
  1. This is not extendible. Whenever I have a new "silent" index, I need to modify all sum which are in different places of my code.
  2. I find it hard to understand. In my case i and j have a clear meaning, thus I want to write what I sum over explictly to get self-documenting code.

我能做的是

i = [x for x in df.index.names if x != 'first']
df.sum(level=i)

虽然这解决了第一个问题,但我认为代码不会变得更清晰.

While this solves the first problem, I don't think the code gets any clearer.

我有一些更好的熊猫功能或更合适的(python)工具吗?

I there some better pandas functionality or better suited (python) tool for that?

推荐答案

尝试一下.

df.groupby(df.index.droplevel('i')).sum() # groupby except index 'i'

这篇关于使用* not *分组的标签,以大 pandas 分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-02 23:22