本文介绍了 pandas :具有混合数据类型的Coloumn;如何找到例外的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大数据框,在读取它时,它会显示以下消息:DtypeWarning:列(0,8)具有混合类型.导入时指定dtype或设置low_memory = False.

I have a large dataframe, and when reading it, it gives me this message:DtypeWarning: Columns (0,8) have mixed types. Specify dtype upon import or set low_memory=False.

它应该是一列浮子,但我怀疑其中有几根绳子掉进去了.我想识别它们,并可能将其删除.

It is supposed to be a column of floats, but I suspect a few strings snuck in there. I would like to identify them, and possibly remove them.

我尝试过df.apply(lambda row:isinstance(row.AnnoyingColumn,(int,float)),1)

I trieddf.apply(lambda row: isinstance(row.AnnoyingColumn, (int, float)), 1)

但这给了我一个内存不足的错误.

But that gave me an out of memory error.

我认为必须有更好的方法.

I assume there must be a better way.

推荐答案

如果浮动,这将为您提供True:

This will give you True if float:

df.some_column.apply(lambda x: isinstance(x, float))

如果是int或字符串,这将为True:

This will give you True if int or string:

df.some_column.apply(lambda x: isinstance(x, (int,str)))

因此,要删除字符串:

mask = df.some_column.apply(lambda x: isinstance(x, str))
df = df[~mask]

删除浮点数和字符串的示例:

Example that removes floats and strings:

$ df = pd.DataFrame({'a': [1,2.0,'hi',4]})
$ df
    a
0   1
1   2
2   hi
3   4

$ mask = df.a.apply(lambda x: isinstance(x, (float,str)))
$ mask
0    False
1    False
2     True
3    False
Name: a, dtype: bool

$ df = df[~mask]
$ df
    a
0   1
3   4

这篇关于 pandas :具有混合数据类型的Coloumn;如何找到例外的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-26 21:47