问题描述
我正在尝试使用 statsmodels 和 pandas 数据框运行多个 OLS 回归.不同行的不同列中存在缺失值,并且我不断收到错误消息:ValueError: 数组不能包含 infs 或 NaNs我看到了这个类似的问题,但并没有完全回答我的问题:statsmodel.api.Logit:valueerror 数组不能包含 infs 或 nans
I'm trying to run a multiple OLS regression using statsmodels and a pandas dataframe. There are missing values in different columns for different rows, and I keep getting the error message:ValueError: array must not contain infs or NaNsI saw this SO question, which is similar but doesn't exactly answer my question: statsmodel.api.Logit: valueerror array must not contain infs or nans
我想做的是运行回归并忽略我在此回归中使用的变量缺少变量的所有行.现在我有:
What I would like to do is run the regression and ignore all rows where there are missing variables for the variables I am using in this regression. Right now I have:
import pandas as pd
import numpy as np
import statsmodels.formula.api as sm
df = pd.read_csv('cl_030314.csv')
results = sm.ols(formula = "da ~ cfo + rm_proxy + cpi + year", data=df).fit()
我想要一些类似missing =drop"的东西.任何建议将不胜感激.非常感谢.
I want something like missing = "drop".Any suggestions would be greatly appreciated. Thanks so much.
推荐答案
您回答了自己的问题.刚刚通过
You answered your own question. Just pass
missing = 'drop'
到 ols
import statsmodels.formula.api as smf
...
results = smf.ols(formula = "da ~ cfo + rm_proxy + cpi + year",
data=df, missing='drop').fit()
如果这不起作用,那么这是一个错误,请在 github 上使用 MWE 报告.
If this doesn't work then it's a bug and please report it with a MWE on github.
仅供参考,请注意上面的导入.在formula.api 命名空间中并非所有内容都可用,因此您应该将其与statsmodels.api 分开.或者只是使用
FYI, note the import above. Not everything is available in the formula.api namespace, so you should keep it separate from statsmodels.api. Or just use
import statsmodels.api as sm
sm.formula.ols(...)
这篇关于使用 statsmodels 忽略多个 OLS 回归中的缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!