问题描述
我正在尝试在R中运行固定效果回归模型.我想控制变量C和D(都不是时间变量)中的异质性.
I'm trying to run a fixed effects regression model in R. I want to control for heterogeneity in variables C and D (neither are a time variable).
我尝试了以下两种方法:
I tried the following two approaches:
1)使用plm软件包:给我以下错误消息
1) Use the plm package: Gives me the following error message
formula = Y ~ A + B + C + D
reg = plm(formula, data= data, index=c('C','D'), method = 'within')
duplicate couples (time-id)Error in pdim.default(index[[1]], index[[2]]) :
我还尝试过先使用
data_p = pdata.frame(data,index=c('C','D'))
但是我在这两列中都有重复的观察.
But I have repeated observations in both columns.
2)使用factor()和lm:效果很好
2) Use factor() and lm: works well
formula = Y ~ A + B + factor(C) + factor(D)
reg = lm(formula, data= data)
两种方法有什么区别?为什么plm对我不起作用?是因为索引之一应该是时间吗?
What is the difference between the two methods? Why is plm not working for me? is it because one of the indices should be time?
推荐答案
该错误表示您重复了由变量C和D组成的id-time对.
That error is saying you have repeated id-time pairs formed by variables C and D.
比方说,您拥有第三个变量F,该变量与C共同使个人与其他人(或您的第一个维度,无论它是什么)区别开来.然后,使用dplyr可以创建一个唯一的索引,例如id
:
Let's say you have a third variable F which jointly with C keep individuals distinct from other one (or your first dimension, whatever it is). Then with dplyr you can create a unique indice, say id
:
data.frame$id <- data.frame %>% group_indices(C, F)
plm中的index参数变为index = c(id, D)
.
The the index argument in plm becomes index = c(id, D)
.
lm + factor()
是一种解决方案,以防万一您有不同的发现.如果不是这种情况,将无法在每个ID中正确加权结果,即无法正确识别固定效果.
The lm + factor()
is a solution just in case you have distinct observations. If this is not the case, it will not properly weights the result within each id, that is, the fixed effect is not properly identified.
这篇关于R中的固定效果:plm vs lm + factor()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!