本文介绍了lmer或二项式GLMM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 R 中运行一个混合模型.但是我在理解我应该为我拥有的数据运行的模型类型时遇到了一些困难.

I am running a mixed model in R. However I am having some difficulty understanding the type of model I should be running for the data that I have.

在计算机化实验中,我们将因变量称为早期按钮按下次数.一个实验由多次试验组成.在每次试验中,参与者必须按下一个按钮才能对出现在屏幕上的目标做出反应.然而,他们可能会过早按下按钮,这就是被衡量的结果变量.因此,例如,参与者 A 在跨试验的实验中可能总共有 3 次早期按钮按下,而参与者 B 可能有 15 次.

Let's call the dependant variable the number of early button presses in a computerised experiment. An experiment is made up of multiple trials. In each trial a participant has to press a button to react to a target appearing on a screen. However they may press the button too early and this is what is being measured as the outcome variable. So for example, participant A may have in total 3 early button presses in an experiment across trials whereas participant B may have 15.

在 R 中使用 lm 命令的简单线性回归模型中,我认为这个结果是一个连续的数值变量.还有……它是参与者在实验中得分的数字.但是,我不是在尝试运行线性回归,而是在尝试运行具有随机效应的混合模型.我对 R 中混合模型的理解是,模型所采用的数据格式应该结构化以显示每个试验的每个参与者.当试验级别的数据像这样结构化时,我的结果栏中突然出现了很多 1 和 0.当然,在试验级别,参与者可能会意外地过早按下按钮,得 1 分,或者不按按钮,得 0 分.

In a straightforward linear regression model using the lm command in R, I would think this outcome is a continuous numerical variable. As well... its a number that participants score on in the experiment. However I am not trying to run a linear regression, I am trying to run a mixed model with random effects. My understanding of a mixed model in R is that the data format that the model takes from should be structured to show every participant by every trial. When the data is structured like this at trial level suddenly I have a lot of 1s and 0s in my outcome column. As of course at a trial level participants may accidently press the button too early scoring a 1, or not and score a 0.

这听起来像需要被视为分类的东西吗?如果是这样,那么它会通过 glmer 函数查看,并将 family 设置为二项式吗?

Does this sound like something that needs to be considered as categorical. If so would it then be looked at through the glmer function with family set to binomial?

谢谢

推荐答案

由 Martin 开始,这个问题似乎更像是一个交叉验证问题.但我会在这里投入我的 2 美分.

As started by Martin, this question seems to be more of a cross-validation question. But I'll throw in my 2 cents here.

问题通常会变成您对实验感兴趣,以及您是否有理由相信您的模型中存在随机效应.在您的示例中,您有 2 种可能的随机效应:个体和试验.在经典的随机效应模型中,通常根据一系列经验法则来选择随机效应,例如

The question often becomes what you're interested in with the experiment, and whether you have cause to believe that there is a random effect in your model. In your example you have 2 possible effects that could be random: The individuals and the trials. In classical random-effect models the random effects are often chosen based on a series of rule-of-thumbs such as

  1. 如果参数可以被认为是随机.这通常是指一个因素内的水平变化.在这种情况下,个体和试验都可能在实验之间发生变化.
  2. 如果您对系统效应感兴趣(例如 A 对 B 的影响有多大),那么该效应不是随机,应该考虑固定效应.就您而言,只有当有足够多的试验来观察对个体的系统影响时,它才真正相关,但随后人们可能会质疑这种影响与普遍结果的相关性.
  1. If the parameter can be thought of as random. This often refers to the levels changing within a factor. In this situation both individuals and the trials are likely to change between experiments.
  2. If you're interested in the systematic effect (eg. how much did A affect B) then the effect is not random and should be considered for the fixed effects. In your case, it is really only relevant if there are enough trials to see a systematic effects across individuals, but one could then question how relevant this effect would be for generalized results.

还有其他一些经验法则存在,但这至少为我们提供了一个起点.下一个问题变成了我们真正感兴趣的效果.就您而言,它不太清楚,但听起来您对以下其中一个感兴趣.

Several other rule-of-thumbs exist out there, but this at least gives us a place to start. The next question becomes which effect we're actually interested in. In your case it is not quite clear, but it sounds like you're interested in one of the following.

  1. 对于任何给定的试验,我们预计有多少次提前按下按钮
  2. 对于任何给定的个人,我们可以预期有多少次提前按下按钮
  3. 在任何给定的试验中,提前按下按钮的可能性有多大

对于前 2 个,您可以受益于对个体或试验进行平均,并使用线性混合效应模型并将对应部分作为随机效应.尽管我认为 泊松 广义线性模型可能更适合,因为您正在建模只能 的计数.例如.在相当普遍的意义上使用:

For the first 2, you can benefit from averaging over either individual or trial and using a linear mixed effect model with the counter part as random effect. Although I would argue that a poisson generalized linear model is likely a better fit, as you are modelling counts that can only be positive. Eg. in a rather general sense use:

#df is assumed contain raw data
#1)
df_agg <- aggregate(. ~ individual, data = df)
lmer(early_clicks ~ . - individual + (1 |  individual)) #or better: glmer(early_clicks ~ . - individual + (1 | individual), family = poisson, data = df_agg)

#2)
df_agg <- aggregate(. ~ trial, data = df)
lmer(early_clicks ~ . - trial+ (1 |  trial)) #or better: glmer(early_clicks ~ . - trial+ (1 | trial), family = poisson, data = df_agg)

#3)
glmer(early_clicks  ~ . + (1 | trial) + (1 | individual), family = binomial, data = df)

请注意,我们可以使用 3) 来获得 1) 和 2) 的答案,通过使用 3) 来预测概率并使用这些来找到预期的 early_clicks.然而,理论上可以证明线性混合模型中使用的估计方法是精确的,而这对于广义线性模型是不可能的.因此,所有模型之间的结果可能略有不同(或相当大).尤其是在 3) 中,随机效应的数量与观测数量相比可能相当可观,在实践中可能无法估计.

Note that we could use 3) to get answers for 1) and 2) by using 3) to predict probabilities and use these to find the expected early_clicks. However one can show theoretically that the estimation methods used in linear mixed models are exact, while this is not possible for generalized linear models. As such the results may differ slightly (or quite substantially) between all models. Especially in 3) the number of random effects may be quite substantial compared to the number of observations, and in practice may be impossible to estimate.

我只是非常简要地介绍了一些原则,虽然它们可能是一个非常简短的介绍,但绝不是详尽无遗.在过去的 15 到 20 年中,混合效应模型的理论和实践方面得到了极大的扩展.如果您想了解有关混合效应模型的更多信息,我建议您从 glmm 常见问题解答开始 旁边是 ben bolker(和其他人)以及其中列出的参考文献.对于估计和实现,我建议阅读 lme4, glmmTMB 和可能的 merTools 包.glmmTMB 是一个更新和有趣的项目.

I have only very briefly gone over some principals, and while they may be a very brief introduction they are by no means exhaustive. In the last 15 - 20 years the theory and practical side of mixed effect models has been extended substantially. If you'd like more information about mixed effect models I'd suggest starting at the glmm faq side by ben bolker (and others) and the references listed within there. For estimation and implementations I suggest reading the vignettes of the lme4, glmmTMB and possibly merTools packages. glmmTMB being a more recent and interesting project.

这篇关于lmer或二项式GLMM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!