问题描述
这个问题与管道'.'有关.点会导致 glm 调用出现问题.
purrr:map 非常适合亚组分析和/或模型比较.但是,当使用 glm
时,调用会混乱并导致问题,例如在计算伪 R2 时.原因是 update
不适用于丑陋的 call
,因此 pscl::pR2
无法计算基的对数似然模型.
purrr:map is wonderful for subgroup analysis and/or model comparison. However, when using glm
, the call is messed up and causing issues, e.g. when computing pseudo-R2s. The reason is that update
doesn't work with the ugly call
, and thus pscl::pR2
cannot compute the log-likelihood of the base model.
pacman::p_load(tidyverse)
#sample data
pacman::p_load(ISLR)
mydata = ISLR::Default
#nest data, students and non-students
Default_nested = Default %>% group_by(student) %>% nest
#fit glms
formul= default ~income+balance
glms = Default_nested %>%
mutate(model=map(data,glm,formula=formul,family='binomial'))
#pscl::pR2 throwing error
pacman::p_load(pscl)
glms %>% mutate(pr2=map(model,pR2))
现在我们可以看看第一个子模型.即使公式包含正确的公式,调用看起来也很奇怪 (formula=..1).
Now we can take a look at the first submodel. The call looks strange (formula=..1) even though formula contains the right formula.
> glms$model[[1]]$call
.f(formula = ..1, family = "binomial", data = .x[[i]])
> glms$model[[1]]$formula
default ~ income + balance
> glms$model[[1]]$data
# A tibble: 7,056 x 3
default balance income
<fct> <dbl> <dbl>
1 No 730. 44362.
当您的 tibble 中有很多(在本例中超过 2 个)glm 对象时,能够使用 pscl::pR2 的最简洁方法是什么?
What is the cleanest way to be able to use pscl::pR2 when you have many (more than 2 in this example) glm objects in your tibble?
解决方案策略概述:
(A) 修复" glm 对象,以便 update
可以应用于它:
(A) "fix" the glm object, so that update
can be applied to it:
glms %>% mutate(model = map(model,function(x){x$call = call2("glm",formula=x$formula,data=quote(Default),family='binomial');x})) %>%
mutate(pr2=map(model,pR2)) %>% unnest(pr2)
这个运行",然而,计算出的 R2 是关闭的.所以这个解决策略很可能是死胡同.
This 'runs', however, the computed R2 is off. So this solution strategy is probably a dead-end.
(B) 按照 Artem 的建议,为 `glm 编写一个 包装器.这应该可以正常工作.缺点:通话看起来很难看.
(B) Write a wrapper for `glm, as proposed by Artem. This should work fine. Downside: the calls look ugly.
我扩展了 Artem 提出的解决方案以创建 glm3
.
I expanded on Artem's proposed solution to create glm3
.
glm3 <- function(formula,data,family) {
eval(rlang::expr( glm(!!rlang::enexpr(data),
formula=!!formula,
family=!!family ) ))}
glms3 <- Default_nested %>% mutate( model=map(data,glm3,formula=formul,family='binomial'),pr2=map(model,pR2) )
glms3 %>% unnest(pr2)
(C) 在这种特殊情况下(伪 R2),只需编写一个更好的 伪 r2 函数.由于它可能是 purrr::map 中唯一不起作用的主要统计数据,因此这实际上可能是有道理的.我把 psr2glm
函数放在一起.
(C) In this particular case (pseudo R2s), simply write a better pseudo-r2 function. Since it's probably the only major statistic that doesn't work within purrr::map, this may actually make sense. I put together the psr2glm
function.
psr2glm=function(glmobj){
L.base=
logLik(
glm(formula = reformulate('1',gsub( " .*$", "", deparse(glmobj$formula) )),
data=glmobj$data,
family = glmobj$family))
n=length(glmobj$residuals)
L.full=logLik(glmobj)
D.full <- -2 * L.full
D.base <- -2 * L.base
G2 <- -2 * (L.base - L.full)
return(data.frame(McFadden = 1-L.full/L.base,
CoxSnell = 1 - exp(-G2/n),
Nagelkerke = (1 - exp((D.full - D.base)/n))/(1 - exp(-D.base/n))))
}
它有效:
glms = Default_nested %>%
mutate(model=map(data,glm,formula=formul,family='binomial'))
glms %>% mutate(pr2=map(model,psr2glm)) %>% unnest(pr2)
我考虑对 DescTools::PseudoR2 提出更改,但是,我首先需要检查解决方案是否通用.
I consider proposing changes to DescTools:::PseudoR2, however, I first need to check if the solution is general.
这个想法的关键是跳过update
,而是直接调用glm
.所有需要的信息都在 glm 对象中,甚至在 purrr::map 中.使用 psr2glm 的不错的副作用:unnest 的输出看起来不错.
The key to this idea is to skip update
and instead directly call glm
. All required information pieces are within the glm object, even within purrr::map.Nice side effect of using psr2glm: unnest's output looks nice.
(D) 更改 glm
或 update
.鉴于 glm 对象实际上包含所有必要的信息,人们可以将观察到的行为视为错误.所以它应该在基础 R 中修复.
(D) Change either glm
or update
. Given that the glm object actually contains all necessary information, one could consider the observed behavior a bug. So it should be fixed in base R.
推荐答案
一种方法是为 glm()
定义一个包装器,通过手动构造表达式然后评估将数据直接放入调用中它:
One way is to define a wrapper for glm()
that puts data directly inside the call by manually constructing the expression and then evaluating it:
glm2 <- function(.df, ...) {
eval(rlang::expr(glm(!!rlang::enexpr(.df),!!!list(...)))) }
glms <- Default_nested %>%
mutate( model = map(data,glm2,formula=formul,family="binomial"),
pr2 = map(model,pscl::pR2) )
# # A tibble: 2 x 4
# student data model pr2
# <fct> <list> <list> <list>
# 1 No <tibble [7,056 × 3]> <glm> <dbl [6]>
# 2 Yes <tibble [2,944 × 3]> <glm> <dbl [6]>
验证:
## Perform the computation by hand and ensure that it's identical to glms$pr2
glm(Default_nested$data[[1]], formula=default~income+balance, family="binomial") %>%
pscl::pR2() %>% identical( glms$pr2[[1]] ) # TRUE
glm(Default_nested$data[[2]], formula=default~income+balance, family="binomial") %>%
pscl::pR2() %>% identical( glms$pr2[[2]] ) # TRUE
这篇关于purrr:map 和 glm - 通话问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!