问题描述
我事先查看了以下帖子。有没有办法使用带有近似因子或公差值的DataFrame.isin()?还是有另一种可能的方法?
I reviewed the following posts beforehand. Is there a way to use DataFrame.isin() with an approximation factor or a tolerance value? Or is there another method that could?
EX)
df = DataFrame({'A' : [5,6,3.3,4], 'B' : [1,2,3.2, 5]})
In : df
Out:
A B
0 5 1
1 6 2
2 3.3 3.2
3 4 5
df[df['A'].isin([3, 6], tol=.5)]
In : df
Out:
A B
1 6 2
2 3.3 3.2
推荐答案
您可以使用:
df[np.isclose(df['A'].values[:, None], [3, 6], atol=.5).any(axis=1)]
Out:
A B
1 6.0 2.0
2 3.3 3.2
np.isclose返回此:
np.isclose returns this:
np.isclose(df['A'].values[:, None], [3, 6], atol=.5)
Out:
array([[False, False],
[False, True],
[ True, False],
[False, False]], dtype=bool)
它是 df ['A']
的元素和 [3,6]
(这就是为什么我们需要 df ['A']。values [:None]
-进行广播的原因)。由于您要查找列表中是否靠近它们,因此我们在末尾调用 .any(axis = 1)
。
It is a pairwise comparison of df['A']
's elements and [3, 6]
(that's why we needed df['A'].values[: None]
- for broadcasting). Since you are looking for whether it is close to any one of them in the list, we call .any(axis=1)
at the end.
对于多列,请稍稍更改切片:
For multiple columns, change the slice a little bit:
mask = np.isclose(df[['A', 'B']].values[:, :, None], [3, 6], atol=0.5).any(axis=(1, 2))
mask
Out: array([False, True, True, False], dtype=bool)
您可以使用此掩码对DataFrame进行切片(即 df [mask]
)
You can use this mask to slice the DataFrame (i.e. df[mask]
)
如果要比较 df ['A']
和 df ['B ']
(以及可能的其他列)具有不同的向量,则可以创建两个不同的掩码:
If you want to compare df['A']
and df['B']
(and possible other columns) with different vectors, you can create two different masks:
mask1 = np.isclose(df['A'].values[:, None], [1, 2, 3], atol=.5).any(axis=1)
mask2 = np.isclose(df['B'].values[:, None], [4, 5], atol=.5).any(axis=1)
mask3 = ...
然后切片:
df[mask1 & mask2] # or df[mask1 & mask2 & mask3 & ...]
这篇关于我可以将pandas.dataframe.isin()与数字公差参数一起使用吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!