问题描述
我有一个大数据框,在读取它时,它会显示以下消息:DtypeWarning:列(0,8)具有混合类型.导入时指定dtype或设置low_memory = False.
I have a large dataframe, and when reading it, it gives me this message:DtypeWarning: Columns (0,8) have mixed types. Specify dtype upon import or set low_memory=False.
它应该是一列浮子,但我怀疑其中有几根绳子掉进去了.我想识别它们,并可能将其删除.
It is supposed to be a column of floats, but I suspect a few strings snuck in there. I would like to identify them, and possibly remove them.
我尝试过df.apply(lambda row:isinstance(row.AnnoyingColumn,(int,float)),1)
I trieddf.apply(lambda row: isinstance(row.AnnoyingColumn, (int, float)), 1)
但这给了我一个内存不足的错误.
But that gave me an out of memory error.
我认为必须有更好的方法.
I assume there must be a better way.
推荐答案
如果浮动,这将为您提供True:
This will give you True if float:
df.some_column.apply(lambda x: isinstance(x, float))
如果是int或字符串,这将为True:
This will give you True if int or string:
df.some_column.apply(lambda x: isinstance(x, (int,str)))
因此,要删除字符串:
mask = df.some_column.apply(lambda x: isinstance(x, str))
df = df[~mask]
删除浮点数和字符串的示例:
Example that removes floats and strings:
$ df = pd.DataFrame({'a': [1,2.0,'hi',4]})
$ df
a
0 1
1 2
2 hi
3 4
$ mask = df.a.apply(lambda x: isinstance(x, (float,str)))
$ mask
0 False
1 False
2 True
3 False
Name: a, dtype: bool
$ df = df[~mask]
$ df
a
0 1
3 4
这篇关于 pandas :具有混合数据类型的Coloumn;如何找到例外的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!