问题描述
我在大熊猫中有一个大数据框,除了用作索引的列外,应该只有数字值:
I have a large dataframe in pandas that apart from the column used as index is supposed to have only numeric values:
df = pandas.DataFrame({"item": ["a", "b", "c", "d", "e"], "a": [1,2,3,"bad",5], "b":[0.1,0.2,0.3,0.4,0.5]})
df = df.set_index("item")
我如何找到数据框 df
中有非数字值的行?在这个例子中,它是数据帧中的第四行,在 a
列中有bad
的字符串。如何以编程方式找到这一行?谢谢。
How can I find the row of the dataframe df
that has a non-numeric value in it? In this example it's the fourth row in the dataframe, which has the string "bad"
in the a
column. How can this row be found programmatically? thanks.
推荐答案
您可以使用来检查每个元素的类型(将一个函数应用于DataFrame中的每个元素):
You could use np.isreal
to check the type of each element (applymap applies a function to each element in the DataFrame):
In [11]: df.applymap(np.isreal)
Out[11]:
a b
item
a True True
b True True
c True True
d False True
e True True
如果所有行都为True,那么它们都是数字的:
If all in the row are True then they are all numeric:
In [12]: df.applymap(np.isreal).all(1)
Out[12]:
item
a True
b True
c True
d False
e True
dtype: bool
所以要获得rouges的subDataFrame,(注意:上述的否定〜找到那些至少有一个流氓非数字的人):
So to get the subDataFrame of rouges, (Note: the negation, ~, of the above finds the ones which have at least one rogue non-numeric):
In [13]: df[~df.applymap(np.isreal).all(1)]
Out[13]:
a b
item
d bad 0.4
您还可以找到您可以使用的第一个罪犯的位置:
You could also find the location of the first offender you could use argmin:
In [14]: np.argmin(df.applymap(np.isreal).all(1))
Out[14]: 'd'
正如所指出的,检查它是否为int或float的实例(np.isreal还有一些额外的开销):
As @CTZhu points out, it may be slightly faster to check whether it's an instance of either int or float (there is some additional overhead with np.isreal):
df.applymap(lambda x: isinstance(x, (int, float)))
这篇关于在大 pandas 数据框中查找非数字行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!