似乎在Pandas中,您可以执行以下任一操作:
age_is_null = pd.isnull(titanic_survival["age"])
age_is_null = titanic_survival["age"].isnull()
似乎两者都存在:Pandas模块中的函数和Dataframe类中的方法(在另一个模块中)。
来自Obj-C背景,这很令人困惑。为什么同时需要两者?
最佳答案
pd.isnull
适用于不同类型(可迭代的任何类型)的输入,例如
>>> import pandas as pd
>>> import numpy as np
>>> pd.isnull(np.array([1, 2]))
array([False, False], dtype=bool)
>>> pd.isnull([1, 2])
array([False, False], dtype=bool)
而
df.isnull
是绑定到DataFrame对象的成员函数。因此,每当第一次创建DataFrame会导致成本高昂时,您都将使用pd.isnull
。时间:
In [30]: %timeit pd.isnull([1,2])
The slowest run took 8.93 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.19 µs per loop
In [33]: %timeit pd.DataFrame([1,2]).isnull()
The slowest run took 6.42 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 202 µs per loop