r - mgcv_1.8-24:bam()的“fREML”或“REML”方法给出了错误的解释偏差

使用方法“ fREML”和“ REML”将相同的模型与bam拟合得到了接近的结果，但是所解释的偏差与summary.gam所返回的差异很大。使用“ fREML”时，数量为〜3.5％（不好），而使用“ REML”时，数量为〜50％（还不错）。怎么可能呢？哪一个是正确的？不幸的是，我无法提供一个简单的可复制示例。######################################### method = "fREML", discrete = TRUE #########################################Family: binomialLink function: logitFormula:ObsOrRand ~ s(Var1, k = 3) + s(RandomVar, bs = "re")Parametric coefficients: Estimate Std. Error z value Pr(>|z|)(Intercept) -5.0026 0.2199 -22.75 <2e-16Approximate significance of smooth terms: edf Ref.df Chi.sq p-values(Var1) 1.00 1.001 17.54 2.82e-05s(RandomVar) 16.39 19.000 145.03 < 2e-16R-sq.(adj) = 0.00349 Deviance explained = 3.57%fREML = 2.8927e+05 Scale est. = 1 n = 312515########################################## method = "fREML", discrete = FALSE ##########################################Family: binomialLink function: logitFormula:ObsOrRand ~ s(Var1, k = 3) + s(RandomVar, bs = "re")Parametric coefficients: Estimate Std. Error z value Pr(>|z|)(Intercept) -4.8941 0.2207 -22.18 <2e-16Approximate significance of smooth terms: edf Ref.df Chi.sq p-values(Var1) 1.008 1.016 17.44 3.09e-05s(RandomVar) 16.390 19.000 144.86 < 2e-16R-sq.(adj) = 0.00349 Deviance explained = 3.57%fREML = 3.1556e+05 Scale est. = 1 n = 312515####################################################### method = "REML", discrete method not applicable #######################################################Family: binomialLink function: logitFormula:ObsOrRand ~ s(Var1, k = 3) + s(RandomVar, bs = "re")Parametric coefficients: Estimate Std. Error z value Pr(>|z|)(Intercept) -4.8928 0.2205 -22.19 <2e-16Approximate significance of smooth terms: edf Ref.df Chi.sq p-values(Var1) 1.156 1.278 16.57 8.53e-05s(RandomVar) 16.379 19.000 142.60 < 2e-16R-sq.(adj) = 0.0035 Deviance explained = 50.8%-REML = 3.1555e+05 Scale est. = 1 n = 312515 最佳答案此问题可以追溯到mgcv_1.8-23。它的changlog读取：* bam extended family extension had introduced a bug in null deviance computation for Gaussian additive case when using methods other than fREML or GCV.Cp. Fixed.现在证明，该修补程序对高斯情况成功，但对非高斯情况不成功。让我首先提供一个可复制的示例，因为您的问题没有一个。set.seed(0)x <- runif(1000)## the linear predictor is a 3rd degree polynomialp <- binomial()$linkinv(0.5 + poly(x, 3) %*% rnorm(3) * 20)## p is well spread out on (0, 1); check `hist(p)`y <- rbinom(1000, 1, p)library(mgcv)#Loading required package: nlme#This is mgcv 1.8-24. For overview type 'help("mgcv-package")'.fREML <- bam(y ~ s(x, bs = 'cr', k = 8), family = binomial(), method = "fREML")REML <- bam(y ~ s(x, bs = 'cr', k = 8), family = binomial(), method = "REML")GCV <- bam(y ~ s(x, bs = 'cr', k = 8), family = binomial(), method = "GCV.Cp")## explained.deviance = (null.deviance - deviance) / null.deviance## so in this example we get negative explained deviance for "REML" methodunlist(REML[c("null.deviance", "deviance")])#null.deviance deviance# 181.7107 1107.5241unlist(fREML[c("null.deviance", "deviance")])#null.deviance deviance# 1357.936 1107.524unlist(GCV[c("null.deviance", "deviance")])#null.deviance deviance# 1357.936 1108.108Null偏差不能小于偏差（TSS不能小于RSS），因此bam的“ REML”方法无法在此处返回正确的Null偏差。我在mgcv_1.8-24/R/bam.r的第1350行找到了问题：object$family <- object$fitted.values <- NULL其实应该是object$null.deviance <- object$fitted.values <- NULL对于“ GCV.Cp”和“ fREML”以外的方法，在将大型bam模型矩阵简化为gam矩阵（n x p：数据数量；：系数数）。由于此新模型矩阵没有自然的解释，因此p x p返回的许多数量应无效（除估计的平滑参数外）。 Simon键入n是一个错字。我构建了一个补丁版本，事实证明可以修复该错误。我将告诉Simon在下一个版本中对其进行修复。关于r - mgcv_1.8-24:bam()的“fREML”或“REML”方法给出了错误的解释偏差，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/51523009/