问题描述
我正在尝试对响应变量在0到1之间的数据进行建模,因此我决定在R中使用分数响应模型.根据我目前的理解,分数响应模型与逻辑回归相似,但是它使用qausi似然法确定参数.我不确定我是否正确理解.
I am trying to model my data in which the response variable is between 0 and 1, so I have decided to use fractional response model in R. From my current understanding, the fractional response model is similar to logistic regression, but it uses qausi-likelihood method to determine parameters. I am not sure I understand it correctly.
到目前为止,我尝试过的是以下数据中的软件包frm
和glm
中的frm
,与该
So far what I have tried is the frm
from package frm
and glm
on the following data, which is the same as this OP
library(foreign)
mydata <- read.dta("k401.dta")
此外,我遵循了 OP 中的步骤,其中glm
是用过的.但是,对于具有frm
的相同数据集,它将返回不同的SE
Further, I followed the procedures in this OP in which glm
is used. However, with the same dataset with frm
, it returns different SE
library(frm)
y <- mydata$prate
x <- mydata[,c('mrate', 'age', 'sole', 'totemp1')]
myfrm <- frm(y, x, linkfrac = 'logit')
frm
返回,
*** Fractional logit regression model ***
Estimate Std. Error t value Pr(>|t|)
INTERCEPT 1.074062 0.048902 21.963 0.000 ***
mrate 0.573443 0.079917 7.175 0.000 ***
age 0.030895 0.002788 11.082 0.000 ***
sole 0.363596 0.047595 7.639 0.000 ***
totemp1 -0.057799 0.011466 -5.041 0.000 ***
Note: robust standard errors
Number of observations: 4734
R-squared: 0.124
使用glm
,我使用
myglm <- glm(prate ~ mrate + totemp1 + age + sole, data = mydata, family = quasibinomial('logit'))
summary(myglm)
Call:
glm(formula = prate ~ mrate + totemp1 + age + sole, family = quasibinomial("logit"),
data = mydata)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.1214 -0.1979 0.2059 0.4486 0.9146
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.074062 0.047875 22.435 < 2e-16 ***
mrate 0.573443 0.048642 11.789 < 2e-16 ***
totemp1 -0.057799 0.011912 -4.852 1.26e-06 ***
age 0.030895 0.003148 9.814 < 2e-16 ***
sole 0.363596 0.051233 7.097 1.46e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for quasibinomial family taken to be 0.2913876)
Null deviance: 1166.6 on 4733 degrees of freedom
Residual deviance: 1023.7 on 4729 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 6
我应该依靠哪个?因为我已经看过 OP glm代替frm
更好吗? > SE估计可能不同
Which one should I rely on? Is it better to use glm
instead of frm
since I have seen the OP that SE estimated could be different
推荐答案
两种方法之间的差异源于健壮标准误差的计算中的不同自由度校正.使用相似的默认值,结果将相同.请参见以下示例:
The differences in the two approaches stem from different degree of freedom corrections in the computation of the robust standard errors. Using similar defaults, the results will be identical. See the following example:
library(foreign)
library(frm)
library(sandwich)
library(lmtest)
df <- read.dta("http://fmwww.bc.edu/ec-p/data/wooldridge/401k.dta")
df$prate <- df$prate/100
y <- df$prate
x <- df[,c('mrate', 'age', 'sole', 'totemp')]
myfrm <- frm(y, x, linkfrac = 'logit')
*** Fractional logit regression model ***
Estimate Std. Error t value Pr(>|t|)
INTERCEPT 0.931699 0.084077 11.081 0.000 ***
mrate 0.952872 0.137079 6.951 0.000 ***
age 0.027934 0.004879 5.726 0.000 ***
sole 0.340332 0.080658 4.219 0.000 ***
totemp -0.000008 0.000003 -2.701 0.007 ***
现在是GLM:
myglm <- glm(prate ~ mrate + totemp + age + sole,
data = df, family = quasibinomial('logit'))
coeftest(myglm, vcov.=vcovHC(myglm, type="HC0"))
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.9316994257 0.0840772572 11.0815 < 0.00000000000000022 ***
mrate 0.9528723652 0.1370808798 6.9512 0.000000000003623 ***
totemp -0.0000082352 0.0000030489 -2.7011 0.006912 **
age 0.0279338963 0.0048785491 5.7259 0.000000010291017 ***
sole 0.3403324262 0.0806576852 4.2195 0.000024488075931 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
对于HC0
,标准错误相同.即,默认情况下frm
使用HC0
.请参阅这篇文章进行广泛的讨论.在某些情况下,sandwich
使用的默认值可能会更好,尽管我怀疑这通常并不重要.您已经从结果中看到了这一点:差异在数值上很小.
With HC0
, the standard errors are identical. That is, frm
uses HC0
by default. See this post for an extensive discussion. The defaults used by sandwich
are probably better in some situations, though I would suspect that it does not matter much in general. You can see this already from your results: the differences are numerically very small.
这篇关于R中的分数响应回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!