pandas :具有混合数据类型的Coloumn；如何找到例外

本文介绍了 pandas :具有混合数据类型的Coloumn；如何找到例外的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个大数据框，在读取它时，它会显示以下消息:DtypeWarning:列(0,8)具有混合类型.导入时指定dtype或设置low_memory = False.

I have a large dataframe, and when reading it, it gives me this message:DtypeWarning: Columns (0,8) have mixed types. Specify dtype upon import or set low_memory=False.

它应该是一列浮子，但我怀疑其中有几根绳子掉进去了.我想识别它们，并可能将其删除.

It is supposed to be a column of floats, but I suspect a few strings snuck in there. I would like to identify them, and possibly remove them.

我尝试过df.apply(lambda row:isinstance(row.AnnoyingColumn，(int，float))，1)

I trieddf.apply(lambda row: isinstance(row.AnnoyingColumn, (int, float)), 1)

但这给了我一个内存不足的错误.

But that gave me an out of memory error.

我认为必须有更好的方法.

I assume there must be a better way.

推荐答案

如果浮动，这将为您提供True:

This will give you True if float:

df.some_column.apply(lambda x: isinstance(x, float))

如果是int或字符串，这将为True:

This will give you True if int or string:

df.some_column.apply(lambda x: isinstance(x, (int,str)))

因此，要删除字符串:

mask = df.some_column.apply(lambda x: isinstance(x, str))
df = df[~mask]

删除浮点数和字符串的示例:

Example that removes floats and strings:

$ df = pd.DataFrame({'a': [1,2.0,'hi',4]})
$ df
    a
0   1
1   2
2   hi
3   4

$ mask = df.a.apply(lambda x: isinstance(x, (float,str)))
$ mask
0    False
1    False
2     True
3    False
Name: a, dtype: bool

$ df = df[~mask]
$ df
    a
0   1
3   4

这篇关于 pandas :具有混合数据类型的Coloumn；如何找到例外的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！