我正在尝试使用线性回归估算熊猫数据框中的缺失值

`

for index in [missing_data_df.horsepower.index]:
    i = 0
    if pd.isnull(missing_data_df.horsepower[index[i]]):
            #linear regression equation
            a = 0.25743277 * missing_data_df.displacement[index[i]] + 0.00958711 *
            missing_data_df.weight[index[i]] + 25.874947903262651
            # replacing "nan" values in dataframe using .set_value
            missing_data_df.set_value(index[i],"horsepower",a)
    i+=1


`
它正在执行。但数据框中缺少的值(nan)不会通过变量'a'中的线性回归被预测值替代。有什么建议吗?

以下是包含缺失数据的数据框
`

   >>> missing_data_df:
       mpg cylinders  displacement  horsepower  weight  acceleration  \
10    NaN       4.0         133.0       115.0  3090.0          17.5
11    NaN       8.0         350.0       165.0  4142.0          11.5
12    NaN       8.0         351.0       153.0  4034.0          11.0
13    NaN       8.0         383.0       175.0  4166.0          10.5
14    NaN       8.0         360.0       175.0  3850.0          11.0
17    NaN       8.0         302.0       140.0  3353.0           8.0
38   25.0       4.0          98.0         NaN  2046.0          19.0
39    NaN       4.0          97.0        48.0  1978.0          20.0
133  21.0       6.0         200.0         NaN  2875.0          17.0
337  40.9       4.0          85.0         NaN  1835.0          17.3
343  23.6       4.0         140.0         NaN  2905.0          14.3
361  34.5       4.0         100.0         NaN  2320.0          15.8
367   NaN       4.0         121.0       110.0  2800.0          15.4
382  23.0       4.0         151.0         NaN  3035.0          20.5

       model_year origin                          car_name
10        70.0    2.0              citroen ds-21 pallas
11        70.0    1.0  chevrolet chevelle concours (sw)
12        70.0    1.0                  ford torino (sw)
13        70.0    1.0           plymouth satellite (sw)
14        70.0    1.0                amc rebel sst (sw)
17        70.0    1.0             ford mustang boss 302
38        71.0    1.0                        ford pinto
39        71.0    2.0       volkswagen super beetle 117
133       74.0    1.0                     ford maverick
337       80.0    2.0              renault lecar deluxe
343       80.0    1.0                ford mustang cobra
361       81.0    2.0                       renault 18i
367       81.0    2.0                         saab 900s
382       82.0    1.0                    amc concord dl


`

最佳答案

您可以为此使用apply和lambda:

missing_data_df['horsepower']= missing_data_df.apply(
    lambda row:
            0.25743277 * row.displacement + 0.00958711 * row.weight + 25.874947903262651
            if np.isnan(row.horsepower) else row.horsepower, axis=1)

关于python - 在python中使用线性回归估算缺失值,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/44097633/

10-12 22:44