本文介绍了按另一个(相同的行数)数据框对列进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

比方说,我有两个简单的数据框:

Let's say I have two simple data frames:

x1 = pd.DataFrame({'a':[1,2,3,4],
                   'b':[10,10,20,20],  
                   'c':['z','z','z','o']})
x2 = pd.DataFrame({'e':['foo', 'bar', 'foo', 'foo'], 
                   'f':['baz', 'blah', 'baz', 'blah']})
> x1
   a   b  c
0  1  10  z
1  2  10  z
2  3  20  z
3  4  20  o
> x2
     e     f
0  foo   baz
1  bar  blah
2  foo   baz
3  foo  blah

我想基于x2中的列将函数应用于x1的组.例如:

I want to apply a function to groups of x1 based on the columns in x2. e.g.:

x1['avg'] = x1.groupby(x2[['e', 'f']])['a'].transform(np.mean)
*** ValueError: Grouper for '<class 'pandas.core.frame.DataFrame'>' not 1-dimensional

但是我得到这个值错误.

But I get this value error.

如果groupby拆分来自x1,则不会发生该错误(但是我不想将x2列分配给x1,出于代码清洁的原因,我不再赘述.

The error doesn't occur if the groupby split is coming from x1 (but I don't want to have to assign x2 columns to x1, for code cleanliness reasons I won't get into.

x1.groupby(['b', 'c'])['a'].transform(np.mean)
0    1.5
1    1.5
2    3.0
3    4.0

为什么会这样/我能解决这个问题吗?

Why is this happening / can I get around it?

推荐答案

您不能传递DataFrame,但可以传递Series的(列表):

You can't pass a DataFrame, but you can pass a (list of) Series:

In [11]: x1.groupby([x2.e, x2.f])["a"].transform("mean")
Out[11]:
0    2
1    2
2    2
3    4
dtype: int64

通常,您可以使用列表理解功能(如果您要按另一个DataFrame中的所有列进行分组):

More generally you could do this with a list comprehension (if you're grouping by all the columns in another DataFrame):

In [12]: x1.groupby([x2[col] for col in x2])["a"].transform("mean")
Out[12]:
0    2
1    2
2    2
3    4
dtype: int64

也就是说,最好继续进行连接... IMO保持变量独立通常是一个好主意.

这篇关于按另一个(相同的行数)数据框对列进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-31 21:57