问题描述
我发现将熊猫in
运算符应用于Series
的困难方式是对索引而不是实际数据进行操作:
I discovered the hard way that Pandas in
operator, applied to Series
operates on indices and not on the actual data:
In [1]: import pandas as pd
In [2]: x = pd.Series([1, 2, 3])
In [3]: x.index = [10, 20, 30]
In [4]: x
Out[4]:
10 1
20 2
30 3
dtype: int64
In [5]: 1 in x
Out[5]: False
In [6]: 10 in x
Out[6]: True
我的直觉是x
系列包含数字1而不是索引10,这显然是错误的.此行为背后的原因是什么?以下方法是最好的替代方法吗?
My intuition is that x
series contains the number 1 and not the index 10, which is apparently wrong. What is the reason behind this behavior? Are the following approaches the best possible alternatives?
In [7]: 1 in set(x)
Out[7]: True
In [8]: 1 in list(x)
Out[8]: True
In [9]: 1 in x.values
Out[9]: True
更新
我对我的建议做了一些时间安排.看来x.values
是最好的方法:
I did some timings on my suggestions. It looks like x.values
is the best way:
In [21]: x = pd.Series(np.random.randint(0, 100000, 1000))
In [22]: x.index = np.arange(900000, 900000 + 1000)
In [23]: x.tail()
Out[23]:
900995 88999
900996 13151
900997 25928
900998 36149
900999 97983
dtype: int64
In [24]: %timeit 36149 in set(x)
10000 loops, best of 3: 190 µs per loop
In [25]: %timeit 36149 in list(x)
1000 loops, best of 3: 638 µs per loop
In [26]: %timeit 36149 in (x.values)
100000 loops, best of 3: 6.86 µs per loop
推荐答案
将pandas.Series
有点像字典,可能会有所帮助,其中index
值等同于keys
.比较:
It is may be helpful to think of the pandas.Series
as being a bit like a dictionary, where the index
values are equivalent to the keys
. Compare:
>>> d = {'a': 1}
>>> 1 in d
False
>>> 'a' in d
True
具有:
>>> s = pandas.Series([1], index=['a'])
>>> 1 in s
False
>>> 'a' in s
True
但是,请注意,对系列进行迭代将对data
而不是index
进行迭代,因此list(s)
将给出[1]
,而不是['a']
.
However, note that iterating over the series iterates over the data
, not the index
, so list(s)
would give [1]
, not ['a']
.
根据文档,index
值必须是唯一且可哈希的" ,所以我猜想那里下面有一个哈希表.
Indeed, per the documentation, the index
values "must be unique and hashable", so I'd guess there's a hashtable under there somewhere.
这篇关于Python Pandas-为什么`in`运算符使用索引而不是数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!