我从数据科学课程开始,该课程要求我处理缺失的数据,方法是删除“价格”子集中包含NaN的行,或者用某个平均值替换NaN。但是我的dropna()和replace()都似乎不起作用。可能是什么问题呢?

我在stackoverflow上经历了很多解决方案,但是我的问题没有解决。我还尝试遍历pandas.pydata.org寻找解决方案,在那里我了解了dropna()的不同参数,例如脱粒,how ='any'等,但没有任何帮助。

import pandas as pd

import numpy as np


url="https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data"
df=pd.read_csv(url,header=None)


'''
Our data comes without any header or column name,hence we assign each column a header name.
'''


headers=["symboling","normalized-losses","make","fuel-type","aspiration","num-of-doors","body-style","drive-wheels","engnie-location","wheel-base","length","width","height","curb-weight","engine-type","num-of-cylinders","engine-size","fuel-system","bore","stroke","compression-ratio","horsepower","peak-rpm","city-mpg","highway-mpg","price"]
df.columns=headers


'''
Now that we have to eliminate rows containing NaN or ? in "price" column in our data
'''

df.dropna(subset=["price"], axis=0, inplace=True)

df.head(12)

#or

df.dropna(subset=["price"], how='any')

df.head(12)

#also to replace

mean=df["price"].mean()

df["price"].replace(np.nan,mean)

df.head(12)


预期所有行都包含ig NaN或“?”在“价格”列中将其删除,以删除dropna()或替换为replace()。但是,数据似乎没有变化。

最佳答案

请使用此代码删除吗?值如下:

df['price'] = pd.to_numeric(df['price'], errors='coerce')
df = df.dropna()


to_numeric方法将参数转换为数字类型。

并且,coerce将无效设置为NaN。

然后,dropna可以清除包含NaN的记录。

关于python - 为什么dropna()和replace()方法无法处理数据帧中的丢失数据?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/55968611/

10-11 03:26
查看更多