问题描述
很抱歉,答案很明显,但是我花了很多时间尝试在mgcv.gam中使用自定义链接功能
Apologies if the answer is obvious but I've spent quite some time trying to use a custom link function in mgcv.gam
简而言之,
- 我想使用来自软件包 psyphy (我想使用 psyphy.probit_2asym ,我称为
custom_link
) -
我可以使用此链接创建一个{stats} family对象,并将其用于glm的"family"参数中.
- I want to use a modified probit link from package psyphy ( I want to use psyphy.probit_2asym, I call it
custom_link
) I can create a {stats}family object with this link and use it in the 'family' argument of glm.
m <- glm(y~x, family=binomial(link=custom_link), ... )
用作{mgcv} gam
It does not work when used as an argument for {mgcv}gam
m <- gam(y~s(x), family=binomial(link=custom_link), ... )
我收到错误Error in fix.family.link.family(family) : link not recognised
如果指定标准的link=probit
,我不会得到此错误的原因,无论是glm还是gam都能正常工作.
I do not get the reason for this error, both glm and gam work if I specify the standard link=probit
.
所以我的问题可以概括为:
So my question can be summarized as:
此自定义链接中缺少哪些适用于glm但不适用于gam的东西?
如果您能给我提示我应该做什么,请提前感谢.
Thanks in advance if you can give me a hint on what I should do.
链接功能
probit.2asym <- function(g, lam) {
if ((g < 0 ) || (g > 1))
stop("g must in (0, 1)")
if ((lam < 0) || (lam > 1))
stop("lam outside (0, 1)")
linkfun <- function(mu) {
mu <- pmin(mu, 1 - (lam + .Machine$double.eps))
mu <- pmax(mu, g + .Machine$double.eps)
qnorm((mu - g)/(1 - g - lam))
}
linkinv <- function(eta) {
g + (1 - g - lam) *
pnorm(eta)
}
mu.eta <- function(eta) {
(1 - g - lam) * dnorm(eta) }
valideta <- function(eta) TRUE
link <- paste("probit.2asym(", g, ", ", lam, ")", sep = "")
structure(list(linkfun = linkfun, linkinv = linkinv,
mu.eta = mu.eta, valideta = valideta, name = link),
class = "link-glm")
}
推荐答案
如您所知,glm
采用迭代加权最小二乘法拟合迭代. gam
的早期版本通过拟合经惩罚的加权加权最小二乘法对此进行了扩展,该功能由gam.fit
函数完成.在某些情况下,这称为性能迭代.
As you may know, glm
takes iteratively reweighted least squares fitting iterations. Early version of gam
extends this by fitting an iteratively penalized reweighted least squares, which is done by the gam.fit
function. This is known as performance iteration in some context.
自2008年以来(甚至可能更早),基于外部迭代的gam.fit3
已将gam.fit
替换为gam
的默认设置.此类更改确实需要该家族的一些其他信息,您可以从中获得有关?fix.family.link
的信息.
Since 2008 (or maybe slightly even earlier), gam.fit3
based on what is called outer iteration has replaced gam.fit
as gam
default. Such change does require some extra information of the family, regarding which you can read about ?fix.family.link
.
两次迭代之间的主要区别是系数beta
的迭代和平滑参数lambda
的迭代是否嵌套.
The major difference between two iterations is whether iteration of coefficients beta
and iteration of smoothing parameters lambda
are nested.
- 性能迭代采用嵌套方式,其中对于
beta
的每次更新,都会执行lambda
的单个迭代; - 外部迭代将这2个迭代完全分开,对于
beta
的每次更新,lambda
的迭代一直进行到结束,直到收敛为止.
- Performance iteration takes the nested way, where for each update of
beta
, a single iteration oflambda
is performed; - Outer iteration completely separate those 2 iterations, where for each update of
beta
, iteration oflambda
is carried to the end till convergence.
显然,外部迭代更稳定,并且收敛失败的可能性较小.
Obviously outer iteration is more stable and less likely to suffer from failure of convergence.
gam
具有参数optimizer
.默认情况下,它采用optimizer = c("outer", "newton")
,这是外部迭代的牛顿方法.但是如果设置optimizer = "perf"
,将需要性能迭代.
gam
has an argument optimizer
. By default it takes optimizer = c("outer", "newton")
, that is the newton method of outer iteration; but if you set optimizer = "perf"
, it will take performance iteration.
因此,在完成以上概述之后,我们有两个选择:
So, after the above overview, we have two options:
- 仍然使用外部迭代,但扩展了自定义链接功能;
- 使用性能迭代来与
glm
保持一致.
- still use outer iteration, but expand your customized link function;
- use performance iteration to stay in line with
glm
.
我很懒,因此将演示第二个(实际上,我对采取第一个方法并不感到太自信).
I am being lazy so will demonstrate the second one (actually I am not feeling too confident to take the first approach).
可复制示例
您没有提供可复制的示例,因此我准备如下示例.
You did not provide a reproducible example, so I prepare one as below.
set.seed(0)
x <- sort(runif(500, 0, 1)) ## covariates (sorted to make plotting easier)
eta <- -4 + 3 * x * exp(x) - 2 * log(x) * sqrt(x) ## true linear predictor
p <- binomial(link = "logit")$linkinv(eta) ## true probability (response)
y <- rbinom(500, 1, p) ## binary observations
table(y) ## a quick check that data are not skewed
# 0 1
#271 229
我将使用您打算使用的功能probit.2asym
的g = 0.1
和lam = 0.1
:
I will take g = 0.1
and lam = 0.1
of the function probit.2asym
you intend to use:
probit2 <- probit.2asym(0.1, 0.1)
par(mfrow = c(1,3))
## fit a glm with logit link
glm_logit <- glm(y ~ x, family = binomial(link = "logit"))
plot(x, eta, type = "l", main = "glm with logit link")
lines(x, glm_logit$linear.predictors, col = 2)
## glm with probit.2asym
glm_probit2 <- glm(y ~ x, family = binomial(link = probit2))
plot(x, eta, type = "l", main = "glm with probit2")
lines(x, glm_probit2$linear.predictors, col = 2)
## gam with probit.2aysm
library(mgcv)
gam_probit2 <- gam(y ~ s(x, bs = 'cr', k = 3), family = binomial(link = probit2),
optimizer = "perf")
plot(x, eta, type = "l", main = "gam with probit2")
lines(x, gam_probit2$linear.predictors, col = 2)
我对s(x)
使用了自然三次样条曲线基cr
,因为对于单变量平滑,不需要使用薄板样条线的默认设置.我还设置了较小的基本尺寸k = 3
(对于三次样条曲线不能较小),因为我的玩具数据接近线性,并且不需要较大的基本尺寸.更重要的是,这似乎可以防止玩具数据集性能迭代收敛失败.
I have used natural cubic spline basis cr
for s(x)
, as for univariate smooth the default setting with thin-plate spline is unnecessary. I have also set a small basis dimension k = 3
(can't be smaller for a cubic spline) as my toy data is near linear and big basis dimension is not needed. More importantly, this seems to prevent convergence failure of performance iteration for my toy dataset.
这篇关于自定义链接功能适用于GLM,但不适用于mgcv GAM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!