问题描述
我想在使用分类变量和聚类标准误差的 statsmodels 中运行回归.
I want to run a regression in statsmodels that uses categorical variables and clustered standard errors.
我有一个包含机构、治疗、年份和入学列的数据集.治疗是一个哑元,机构是一个字符串,其他的是数字.我已确保删除任何空值.
I have a dataset with columns institution, treatment, year, and enrollment. Treatment is a dummy, institution is a string, and the others are numbers. I've made sure to drop any null values.
df.dropna()
reg_model = smf.ols("enroll ~ treatment + C(year) + C(institution)", df)
.fit(cov_type='cluster', cov_kwds={'groups': df['institution']})
我得到以下信息:
ValueError:权重和列表的长度不同.
有没有办法解决这个问题,让我的标准错误集群?
Is there a way to fix this so my standard errors cluster?
推荐答案
您需要 cov_type='cluster'
合身.
cov_type
是关键字参数,当关键字用作位置参数时,位置不正确.http://www.statsmodels.org/stable/生成/statsmodels.regression.linear_model.OLS.fit.html
cov_type
is a keyword argument and not in the correct position when keywords are used as positional arguments.http://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.OLS.fit.html
一般来说,当关键字参数用作位置参数时,statsmodels 不保证向后兼容性,即关键字位置在未来版本中可能会发生变化.
In general, statsmodels does not guarantee backwards compatibility when keyword arguments are used as positional arguments, that is keyword positions might change in future versions.
但是,我不明白 ValueError 是从哪里来的.Python 具有非常有用的回溯,在提问时添加完整的回溯或至少显示异常发生位置的最后几行非常有用.
However, I don't understand where the ValueError is coming from.Python has very informative tracebacks, and it is very useful when asking questions to add either the full traceback or at least the last few lines that show where the exception is raised.
这篇关于带有分类变量的 statsmodels 中的聚类标准错误 (Python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!