本文介绍了大于/小于Pandas DataFrames/系列之间的比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在DataFrame和Series之间进行比较?我想屏蔽一个DataFrame/Series中的元素大于/小于另一个DataFrame/Series中的元素.

How can I perform comparisons between DataFrames and Series? I'd like to mask elements in a DataFrame/Series that are greater/less than elements in another DataFrame/Series.

例如,以下内容不会替换大于均值的元素和南人在一起,尽管我期望它能做到:

For instance, the following doesn't replace elements greater than the meanwith nans although I was expecting it to:

>>> x = pd.DataFrame(data={'a': [1, 2], 'b': [3, 4]})
>>> x[x > x.mean(axis=1)] = np.nan
>>> x
   a  b
0  1  3
1  2  4

如果我们看一下比较创建的布尔数组,那真的很奇怪:

If we look at the boolean array created by the comparison, it is really weird:

>>> x = pd.DataFrame(data={'a': [1, 2], 'b': [3, 4]})
>>> x > x.mean(axis=1)
       a      b      0      1
0  False  False  False  False
1  False  False  False  False

我不明白这种布尔数组是什么样的逻辑.我可以通过使用转置解决此问题:

I don't understand by what logic the resulting boolean array is like that. I'm able to work around this problem by using transpose:

>>> (x.T > x.mean(axis=1).T).T
       a     b
0  False  True
1  False  True

但是我相信有一些我不知道的正确"方法.至少我想了解发生了什么.

But I believe there is some "correct" way of doing this that I'm not aware of. And at least I'd like to understand what is going on.

推荐答案

这里的问题是,它将索引解释为列值以执行比较,如果使用.gt并传递axis=0,则会得到结果您想要:

The problem here is that it's interpreting the index as column values to perform the comparison, if you use .gt and pass axis=0 then you get the result you desire:

In [203]:
x.gt(x.mean(axis=1), axis=0)

Out[203]:
       a     b
0  False  True
1  False  True

与np数组进行比较时,您会明白我的意思:

You can see what I mean when you perform the comparison with the np array:

In [205]:
x > x.mean(axis=1).values

Out[205]:
       a      b
0  False  False
1  False   True

在这里您可以看到默认的比较轴位于该列上,从而导致结果不同

here you can see that the default axis for comparison is on the column, resulting in a different result

这篇关于大于/小于Pandas DataFrames/系列之间的比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-14 11:21