问题描述
如何在DataFrame和Series之间进行比较?我想屏蔽一个DataFrame/Series中的元素大于/小于另一个DataFrame/Series中的元素.
How can I perform comparisons between DataFrames and Series? I'd like to mask elements in a DataFrame/Series that are greater/less than elements in another DataFrame/Series.
例如,以下内容不会替换大于均值的元素和南人在一起,尽管我期望它能做到:
For instance, the following doesn't replace elements greater than the meanwith nans although I was expecting it to:
>>> x = pd.DataFrame(data={'a': [1, 2], 'b': [3, 4]})
>>> x[x > x.mean(axis=1)] = np.nan
>>> x
a b
0 1 3
1 2 4
如果我们看一下比较创建的布尔数组,那真的很奇怪:
If we look at the boolean array created by the comparison, it is really weird:
>>> x = pd.DataFrame(data={'a': [1, 2], 'b': [3, 4]})
>>> x > x.mean(axis=1)
a b 0 1
0 False False False False
1 False False False False
我不明白这种布尔数组是什么样的逻辑.我可以通过使用转置解决此问题:
I don't understand by what logic the resulting boolean array is like that. I'm able to work around this problem by using transpose:
>>> (x.T > x.mean(axis=1).T).T
a b
0 False True
1 False True
但是我相信有一些我不知道的正确"方法.至少我想了解发生了什么.
But I believe there is some "correct" way of doing this that I'm not aware of. And at least I'd like to understand what is going on.
推荐答案
这里的问题是,它将索引解释为列值以执行比较,如果使用.gt
并传递axis=0
,则会得到结果您想要:
The problem here is that it's interpreting the index as column values to perform the comparison, if you use .gt
and pass axis=0
then you get the result you desire:
In [203]:
x.gt(x.mean(axis=1), axis=0)
Out[203]:
a b
0 False True
1 False True
与np数组进行比较时,您会明白我的意思:
You can see what I mean when you perform the comparison with the np array:
In [205]:
x > x.mean(axis=1).values
Out[205]:
a b
0 False False
1 False True
在这里您可以看到默认的比较轴位于该列上,从而导致结果不同
here you can see that the default axis for comparison is on the column, resulting in a different result
这篇关于大于/小于Pandas DataFrames/系列之间的比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!