本文介绍了通过从 pandas 数据框中的非缺失值中随机选择来填充缺失数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Pandas 数据框,其中有几个缺失值.我注意到非缺失值彼此接近.因此,我想通过随机选择非缺失值来估算缺失值.

I have a pandas data frame where there are a several missing values. I noticed that the non missing values are close to each other. Thus, I would like to impute the missing values by randomly choosing the non missing values.

例如:

import pandas as pd
import random
import numpy as np

foo = pd.DataFrame({'A': [2, 3, np.nan, 5, np.nan], 'B':[np.nan, 4, 2, np.nan, 5]})
foo
    A   B
0   2 NaN
1   3   4
2 NaN   2   
3   5 NaN
4 NaN   5

我想要例如 foo['A'][2]=2foo['A'][5]=3我的熊猫数据帧的形状是 (6940,154).我试试这个

I would like for instance foo['A'][2]=2 and foo['A'][5]=3The shape of my pandas DataFrame is (6940,154).I try this

foo['A'] = foo['A'].fillna(random.choice(foo['A'].values.tolist()))

但它不起作用.你能帮我实现吗?最好的问候.

But it not working. Could you help me achieve that? Best regards.

推荐答案

您可以使用 pandas.fillna 方法和 random.choice 方法来填充缺失值随机选择特定列.

You can use pandas.fillna method and the random.choice method to fill the missing values with a random selection of a particular column.

import random
import numpy as np

df["column"].fillna(lambda x: random.choice(df[df[column] != np.nan]["column"]), inplace =True)

其中 column 是您要随机填充非 nan 值的列.

Where column is the column you want to fill with non nan values randomly.

这篇关于通过从 pandas 数据框中的非缺失值中随机选择来填充缺失数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-19 02:20