数据框中查找非数字行

数据框中查找非数字行

本文介绍了在大 pandas 数据框中查找非数字行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在大熊猫中有一个大数据框,除了用作索引的列外,应该只有数字值:

I have a large dataframe in pandas that apart from the column used as index is supposed to have only numeric values:

df = pandas.DataFrame({"item": ["a", "b", "c", "d", "e"], "a": [1,2,3,"bad",5], "b":[0.1,0.2,0.3,0.4,0.5]})
df = df.set_index("item")

我如何找到数据框 df 中有非数字值的行?在这个例子中,它是数据帧中的第四行,在 a 列中有bad的字符串。如何以编程方式找到这一行?谢谢。

How can I find the row of the dataframe df that has a non-numeric value in it? In this example it's the fourth row in the dataframe, which has the string "bad" in the a column. How can this row be found programmatically? thanks.

推荐答案

您可以使用来检查每个元素的类型(将一个函数应用于DataFrame中的每个元素):

You could use np.isreal to check the type of each element (applymap applies a function to each element in the DataFrame):

In [11]: df.applymap(np.isreal)
Out[11]:
          a     b
item
a      True  True
b      True  True
c      True  True
d     False  True
e      True  True

如果所有行都为True,那么它们都是数字的:

If all in the row are True then they are all numeric:

In [12]: df.applymap(np.isreal).all(1)
Out[12]:
item
a        True
b        True
c        True
d       False
e        True
dtype: bool

所以要获得rouges的subDataFrame,(注意:上述的否定〜找到那些至少有一个流氓非数字的人):

So to get the subDataFrame of rouges, (Note: the negation, ~, of the above finds the ones which have at least one rogue non-numeric):

In [13]: df[~df.applymap(np.isreal).all(1)]
Out[13]:
        a    b
item
d     bad  0.4

您还可以找到您可以使用的第一个罪犯的位置:

You could also find the location of the first offender you could use argmin:

In [14]: np.argmin(df.applymap(np.isreal).all(1))
Out[14]: 'd'

正如所指出的,检查它是否为int或float的实例(np.isreal还有一些额外的开销):

As @CTZhu points out, it may be slightly faster to check whether it's an instance of either int or float (there is some additional overhead with np.isreal):

df.applymap(lambda x: isinstance(x, (int, float)))

这篇关于在大 pandas 数据框中查找非数字行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!