R中的分数响应回归

R中的分数响应回归

本文介绍了R中的分数响应回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对响应变量在0到1之间的数据进行建模,因此我决定在R中使用分数响应模型.根据我目前的理解,分数响应模型与逻辑回归相似,但是它使用qausi似然法确定参数.我不确定我是否正确理解.

I am trying to model my data in which the response variable is between 0 and 1, so I have decided to use fractional response model in R. From my current understanding, the fractional response model is similar to logistic regression, but it uses qausi-likelihood method to determine parameters. I am not sure I understand it correctly.

到目前为止,我尝试过的是以下数据中的软件包frmglm中的frm,与该

So far what I have tried is the frm from package frm and glm on the following data, which is the same as this OP

library(foreign)
mydata <- read.dta("k401.dta")

此外,我遵循了 OP 中的步骤,其中glm是用过的.但是,对于具有frm的相同数据集,它将返回不同的SE

Further, I followed the procedures in this OP in which glm is used. However, with the same dataset with frm, it returns different SE

library(frm)
y <- mydata$prate
x <- mydata[,c('mrate', 'age', 'sole', 'totemp1')]
myfrm <- frm(y, x, linkfrac = 'logit')

frm返回,

*** Fractional logit regression model ***

           Estimate Std. Error t value Pr(>|t|)
INTERCEPT  1.074062   0.048902  21.963    0.000 ***
mrate      0.573443   0.079917   7.175    0.000 ***
age        0.030895   0.002788  11.082    0.000 ***
sole       0.363596   0.047595   7.639    0.000 ***
totemp1   -0.057799   0.011466  -5.041    0.000 ***

Note: robust standard errors

Number of observations: 4734
R-squared: 0.124

使用glm,我使用

myglm <- glm(prate ~ mrate + totemp1 + age + sole, data = mydata, family = quasibinomial('logit'))
summary(myglm)

Call:
glm(formula = prate ~ mrate + totemp1 + age + sole, family = quasibinomial("logit"),
    data = mydata)

Deviance Residuals:
    Min       1Q   Median       3Q      Max
-3.1214  -0.1979   0.2059   0.4486   0.9146

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)  1.074062   0.047875  22.435  < 2e-16 ***
mrate        0.573443   0.048642  11.789  < 2e-16 ***
totemp1     -0.057799   0.011912  -4.852 1.26e-06 ***
age          0.030895   0.003148   9.814  < 2e-16 ***
sole         0.363596   0.051233   7.097 1.46e-12 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasibinomial family taken to be 0.2913876)

    Null deviance: 1166.6  on 4733  degrees of freedom
Residual deviance: 1023.7  on 4729  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 6

我应该依靠哪个?因为我已经看过 OP glm代替frm更好吗? > SE估计可能不同

Which one should I rely on? Is it better to use glm instead of frm since I have seen the OP that SE estimated could be different

推荐答案

两种方法之间的差异源于健壮标准误差的计算中的不同自由度校正.使用相似的默认值,结果将相同.请参见以下示例:

The differences in the two approaches stem from different degree of freedom corrections in the computation of the robust standard errors. Using similar defaults, the results will be identical. See the following example:

library(foreign)
library(frm)
library(sandwich)
library(lmtest)

df <- read.dta("http://fmwww.bc.edu/ec-p/data/wooldridge/401k.dta")
df$prate <- df$prate/100

y <- df$prate
x <- df[,c('mrate', 'age', 'sole', 'totemp')]

myfrm <- frm(y, x, linkfrac = 'logit')

*** Fractional logit regression model ***

           Estimate Std. Error t value Pr(>|t|)
INTERCEPT  0.931699   0.084077  11.081    0.000 ***
mrate      0.952872   0.137079   6.951    0.000 ***
age        0.027934   0.004879   5.726    0.000 ***
sole       0.340332   0.080658   4.219    0.000 ***
totemp    -0.000008   0.000003  -2.701    0.007 ***

现在是GLM:

myglm <- glm(prate ~ mrate + totemp + age + sole,
             data = df, family = quasibinomial('logit'))
coeftest(myglm, vcov.=vcovHC(myglm, type="HC0"))

z test of coefficients:

                 Estimate    Std. Error z value              Pr(>|z|)
(Intercept)  0.9316994257  0.0840772572 11.0815 < 0.00000000000000022 ***
mrate        0.9528723652  0.1370808798  6.9512     0.000000000003623 ***
totemp      -0.0000082352  0.0000030489 -2.7011              0.006912 **
age          0.0279338963  0.0048785491  5.7259     0.000000010291017 ***
sole         0.3403324262  0.0806576852  4.2195     0.000024488075931 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

对于HC0,标准错误相同.即,默认情况下frm使用HC0.请参阅这篇文章进行广泛的讨论.在某些情况下,sandwich使用的默认值可能会更好,尽管我怀疑这通常并不重要.您已经从结果中看到了这一点:差异在数值上很小.

With HC0, the standard errors are identical. That is, frm uses HC0 by default. See this post for an extensive discussion. The defaults used by sandwich are probably better in some situations, though I would suspect that it does not matter much in general. You can see this already from your results: the differences are numerically very small.

这篇关于R中的分数响应回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-28 22:27