问题描述
我试图理解R中AIC/BIC的结果.由于某些原因,R将要估算的参数数量加1.因此,R使用与2 * p - 2 * logLik
不同的公式(在高斯情况下,logLik
是残差平方和).实际上,它使用:2 * (p + 1) - 2 * logLik
.
I am trying to understand the results from AIC/BIC in R. For some reason R adds 1 to the number of parameters to be estimated. Hence R uses a different formula than 2 * p - 2 * logLik
(in Gaussian case logLik
is residual sum of squares). In fact it uses: 2 * (p + 1) - 2 * logLik
.
经过研究,我发现问题与stats:::logLik.lm()
有关.
After a research, I found the problem is related to stats:::logLik.lm()
.
> stats:::logLik.lm ## truncated R function body
## ...
## attr(val, "df") <- p + 1
## ...
作为一个真实示例(使用R的内置数据集trees
),请考虑:
As a real example (using R's built-in dataset trees
), consider:
x <- lm(Height ~ Girth, trees) ## a model with 2 parameters
logLik(x)
## 'log Lik.' -96.01663 (df=3)
这真令人困惑.有人知道为什么吗?
This is really puzzling. Anyone knows why?
glm
@ crayfish44的示例
glm
examples by @crayfish44
model.g <- glm(dist ~ speed, cars, family=gaussian)
logLik(model.g) # df=3
model.p <- glm(dist ~ speed, cars, family=poisson)
logLik(model.p) #df=2
model.G <- glm(dist ~ speed, cars, family=Gamma)
logLik(model.G) #df=3
Edit2:logLik
methods of logLik
> methods(logLik)
[1] logLik.Arima* logLik.glm* logLik.lm* logLik.logLik* logLik.nls*
推荐答案
当我们决定检查stats:::logLik.lm
时,我们实际上很接近答案.如果我们进一步检查stats:::logLik.glm
(感谢@ crayfish44提供的glm示例:朋友,您真棒.您再一次给了我灵感,因为上一篇有关persp()
和trans3d()
的帖子.谢谢!),我们将解决该问题.
We were actually really close to the answer, when we decided to inspect stats:::logLik.lm
. Had we further inspect stats:::logLik.glm
(Thanks to the glm example by @crayfish44: Mate, you are awesome. Once again you give me inspiration, since the last post regarding persp()
and trans3d()
. Thanks!), we would have solved the issue.
使用:::
的陷阱是,我们无法查看该代码的注释.因此,我决定检查R-3.3.0的源文件.您可以打开文件R-3.3.0/src/library/stats/R/logLik.R
来查看通用功能logLik.**
的注释代码.
The pitfall of using :::
, is that we are unable to view comments for the code. So I've decided to check the source file of R-3.3.0. You can open the file R-3.3.0/src/library/stats/R/logLik.R
to view commented code for generic functions logLik.**
.
## log-likelihood for glm objects
logLik.glm <- function(object, ...)
{
if(!missing(...)) warning("extra arguments discarded")
fam <- family(object)$family
p <- object$rank
## allow for estimated dispersion
if(fam %in% c("gaussian", "Gamma", "inverse.gaussian")) p <- p + 1
val <- p - object$aic / 2
## Note: zero prior weights have NA working residuals.
attr(val, "nobs") <- sum(!is.na(object$residuals))
attr(val, "df") <- p
class(val) <- "logLik"
val
}
注意以下几行:
p <- object$rank
## allow for estimated dispersion
if(fam %in% c("gaussian", "Gamma", "inverse.gaussian")) p <- p + 1
p
是秩检测后模型系数的有效数量.
p
is the effect number of model coefficients after rank-detection.
- 当我们具有
"gaussian()"
,"Gamma()"
和"inverse.gaussian()"
响应时,自由度加1,因为我们需要估计指数分布的色散参数. - 对于"
binomial()
"和"poisson()
"响应,色散参数已知为1,因此无需估算.
- When we have
"gaussian()"
,"Gamma()"
and"inverse.gaussian()"
response, the degree of freedom is added 1, as we need the estimation of dispersion parameter of exponential distribution. - For "
binomial()
" and "poisson()
" response, the dispersion parameter is known to be 1, hence needs not be estimated.
也许?logLik
应该考虑解释这一点,以防有些像我们这样愚蠢的人!
Maybe ?logLik
should consider explaining this, in case there are some ones as foolish as we are!
这篇关于logLik.lm():为什么R为自由度使用(p + 1)而不是p?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!