问题描述
我目前正在处理实验中的一些数据.因此,我有一些随机分配给2种不同治疗方法的个体的数据.对于每种治疗,我们进行了三个疗程.在每次会议中,都要求参与者做出一系列决定.
I'm currently working on some data from an experiment. Thus, I have data about some individuals who are randomly assigned to 2 different treatments. For each treatment, we ran three sessions. In each session, participants were asked to make a sequence of decisions.
我想做的是:(1)使用模型对治疗效果进行评估,该模型包括对个体及之后的随机效应,(2)按会话聚类标准错误.
What I would like to do is to:(1) estimate the effect of the treatment with a model that includes random effects on individuals and afterwards,(2) clustering the standard errors by session.
在R中,我可以使用 plm 软件包轻松估算随机效果模型:
In R, I can easily estimate the random effect model with the plm package:
model.plm<-plm(formula=DependentVar~TreatmentVar+SomeIndependentVars,data=data,
model="random",effect="individual")
我的问题是我无法通过可变会话(即个人参加的会话)对标准错误进行聚类.确实,plm软件包的鲁棒协方差矩阵估计器让我在两种类型的聚类之间进行选择: 组"和"时间".因此,如果我选择组"选项,则会得到标准错误,这些错误聚集在各个级别:
My problem is that I'm not able to cluster the standard errors by the variable session, i.e. the session the individuals participated in. Indeed the Robust Covariance Matrix Estimators of the plm package let me choose between 2 types of clusters: "groups" and "time". So, if I choose the option "group" I get standard errors clustered at the individual level:
vcovHC(model.plm,type="HC0",cluster="group")
是否可以选择其他聚类变量?
Is there a way to choose a different clustering variable ?
非常感谢您的帮助.
推荐答案
您可能对此感兴趣: https://stats.stackexchange.com/Questions/85909/为什么需要固定效果的ols需要唯一的时间元素
You may be interested in this:https://stats.stackexchange.com/questions/85909/why-does-a-fixed-effect-ols-need-unique-time-elements
这是我对内部"模型的解决方案:
Here's my solution for "within" models:
#' Fixed effect cluster regression, estimated efficiently using plm()
#' @param form The model formula.
#' @param data The data.frame.
#' @param index Character vector giving the column name indexing individual units.
#' @param cluster Character vector giving the column name indexing clusters, or "robust" to avoid the bootstrap and just return robust SE.
#' @param param A list of control parameters, with named elements as follows: R is the number of bootstrap replicates.
#' @return Coefficients plus clustered standard errors
feClusterRegress <- function( form, data, index, cluster = "robust", param = list( R = 30 ) ) {
if( "data.table" %in% class(data) ) data <- as.data.frame(data) # Not ideal efficiency-wise since I re-convert it later but necessary until I generalize the code to data.tables (the plm call doesn't work with them, for instance)
stopifnot( class(form)=="formula" )
mdl <- plm( form, data = data, model = "within", effect="individual", index = index )
if( cluster=="robust" ) {
res <- summary( mdl, robust=TRUE )
} else { # Bootstrap
require(foreach)
require(data.table)
# Prepare data structures for efficient sampling
clusters <- unique( data[[cluster]] )
if( is.null(clusters) ) stop("cluster must describe a column name that exists!")
clusterList <- lapply( clusters, function(x) which( data[[cluster]] == x ) )
names(clusterList) <- clusters
progressBar <- txtProgressBar( 0, param$R )
# Convert to data.table and drop extraneous variables
data <- as.data.table( data[ , c( all.vars(form), index ) ] ) # For faster sub-setting
# Sample repeatedly
coefList <- foreach( i = seq( param$R ) ) %dopar% {
setTxtProgressBar( progressBar, i )
clusterSample <- sample( as.character( clusters ), replace=TRUE )
indexSample <- unlist( clusterList[ clusterSample ], use.names=FALSE )
dataSample <- data[ indexSample, ]
dataSample[ , fakeTime := seq(.N), by = index ] # fakeTime is necessary due to a potential bug in plm. See https://stats.stackexchange.com/questions/85909/why-does-a-fixed-effect-ols-need-unique-time-elements
try( coefficients( plm( form, data = as.data.frame(dataSample), model = "within", effect="individual", index = c( index, "fakeTime") ) ) )
}
failed <- vapply( coefList, function(x) class(x) == "try-error", FUN.VALUE=NA )
if( any(failed) ) {
warning( "Some runs of the regression function failed" )
coefList <- coefList[ !failed ]
}
coefMat <- stack( coefList )
SE <- apply( coefMat, 2, sd )
res <- structure(
list(
cbind( coefficients( mdl ), SE ),
model = mdl
),
class = "feClusterPLM",
R = param$R
)
}
res
}
我怀疑您实际上需要这些变量,因此与其产生虚假的时间,不如生成一个伪"组,而只是在获取每个引导程序样本后立即组成一个新的组标识符.
I suspect you actually need the variables, so instead of generating a fake time generate a "fake" group--just make up a new group identifier right after you grab each bootstrap sample.
这篇关于具有标准误差的个体随机效应模型聚集在R中的不同变量上(R项目)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!