问题描述
R函数,multinom
(程序包nnet
)和mlogit
(程序包mlogit
)都可以用于多项逻辑回归.但是为什么这个示例返回不同的系数p值结果?
Both R functions, multinom
(package nnet
) and mlogit
(package mlogit
) can be used for multinomial logistic regression. But why this example returns different result of p values of coefficients?
#prepare data
#prepare data
mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
mydata$rank <- factor(mydata$rank)
mydata$gre[1:10] = rnorm(10,mean=80000)
#multinom
:
#multinom
:
test = multinom(admit ~ gre + gpa + rank, data = mydata)
z <- summary(test)$coefficients/summary(test)$standard.errors
# For simplicity, use z-test to approximate t test.
pv <- (1 - pnorm(abs(z)))*2
pv
# (Intercept) gre gpa rank2 rank3 rank4
# 0.00000000 0.04640089 0.00000000 0.00000000 0.00000000 0.00000000
#mlogit
:
#mlogit
:
mldata = mlogit.data(mydata,choice = 'admit', shape = "wide")
mlogit.model1 <- mlogit(admit ~ 1 | gre + gpa + rank, data = mldata)
summary(mlogit.model1)
# Coefficients :
# Estimate Std. Error t-value Pr(>|t|)
# 1:(intercept) -3.5826e+00 1.1135e+00 -3.2175 0.0012930 **
# 1:gre 1.7353e-05 8.7528e-06 1.9825 0.0474225 *
# 1:gpa 1.0727e+00 3.1371e-01 3.4195 0.0006274 ***
# 1:rank2 -6.7122e-01 3.1574e-01 -2.1258 0.0335180 *
# 1:rank3 -1.4014e+00 3.4435e-01 -4.0697 4.707e-05 ***
# 1:rank4 -1.6066e+00 4.1749e-01 -3.8482 0.0001190 ***
为什么multinorm
和mlogit
的p值如此不同?我想这是因为我使用mydata$gre[1:10] = rnorm(10,mean=80000)
添加的异常值.如果离群值是不可避免的问题(例如在基因组学,代谢组学等方面),我应该使用哪个R函数?
Why the p values from multinorm
and mlogit
are so different? I guess it is because of the outliers I added using mydata$gre[1:10] = rnorm(10,mean=80000)
. If outlier is an inevitable issue (for example in genomics, metabolomics, etc.), which R function should I use?
推荐答案
此处的区别是Wald $ z $检验(您在pv
中计算出的结果)和似然比检验(由summary(mlogit.model)
.Wald检验在计算上更简单,但通常具有较不理想的属性(例如,其配置项不是定标不变的).您可以阅读有关这两个过程的更多信息此处.
The difference here is the difference between the Wald $z$ test (what you calculated in pv
) and the Likelihood Ratio test (what is returned by summary(mlogit.model)
. The Wald test is computationally simpler, but in general has less desirable properties (e.g., its CIs are not scale-invariant). You can read more about the two procedures here.
要在nnet
模型系数上执行LR测试,可以加载car
和lmtest
程序包并调用Anova(test)
(尽管对于单个df测试,您需要做更多的工作)
To perform LR tests on your nnet
model coefficents, you can load the car
and lmtest
packages and call Anova(test)
(though you'll have to do a little more work for the single df tests).
这篇关于R中的多项式逻辑回归:nnet程序包中的多项式与mlogit程序包中的mlogit有何不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!