问题描述
我想使用lm
和普通矩阵代数计算一个简单的回归.但是,我从矩阵代数获得的回归系数仅为使用lm
获得的回归系数的一半,我不知道为什么.
I wanted to compute a simple regression using the lm
and plain matrix algebra. However, my regression coefficients obtained from matrix algebra are only half of those obtained from using the lm
and I have no clue why.
这是代码
boot_example <- data.frame(
x1 = c(1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L),
x2 = c(0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L),
x3 = c(1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L),
x4 = c(0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L),
x5 = c(1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L),
x6 = c(0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L),
preference_rating = c(9L, 7L, 5L, 6L, 5L, 6L, 5L, 7L, 6L)
)
dummy_regression <- lm("preference_rating ~ x1+x2+x3+x4+x5+x6", data = boot_example)
dummy_regression
Call:
lm(formula = "preference_rating ~ x1+x2+x3+x4+x5+x6", data = boot_example)
Coefficients:
(Intercept) x1 x2 x3 x4 x5 x6
4.2222 1.0000 -0.3333 1.0000 0.6667 2.3333 1.3333
###The same by matrix algebra
X <- matrix(c(
c(1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L), #upper var
c(0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L), #upper var
c(1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L), #country var
c(0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L), #country var
c(1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L), #price var
c(0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L) #price var
),
nrow = 9, ncol=6)
Y <- c(9L, 7L, 5L, 6L, 5L, 6L, 5L, 7L, 6L)
#Using standardized (mean=0, std=1) "z" -transformation Z = (X-mean(X))/sd(X) for all predictors
X_std <- apply(X, MARGIN = 2, FUN = function(x){(x-mean(x))/sd(x)})
##If constant shall be computed as well, uncomment next line
#X_std <- cbind(c(rep(1,9)),X_std)
#using matrix algebra formula
solve(t(X_std) %*% X_std) %*% (t(X_std) %*% Y)
[,1]
[1,] 0.5000000
[2,] -0.1666667
[3,] 0.5000000
[4,] 0.3333333
[5,] 1.1666667
[6,] 0.6666667
有人在我的矩阵计算中看到错误吗?
Does anyone see the error in my matrix computation?
提前谢谢!
推荐答案
lm
未执行标准化.如果要通过lm
获得相同的结果,则需要:
lm
is not performing standardization. If you want to obtain the same result by lm
, you need:
X1 <- cbind(1, X) ## include intercept
solve(crossprod(X1), crossprod(X1,Y))
# [,1]
#[1,] 4.2222222
#[2,] 1.0000000
#[3,] -0.3333333
#[4,] 1.0000000
#[5,] 0.6666667
#[6,] 2.3333333
#[7,] 1.3333333
我不想重复,我们应该使用crossprod
.请参阅使用glmnet
的里奇回归得到的系数与我通过教科书定义"计算出的系数不同吗? a>
I don't want to repeat that we should use crossprod
. See the "follow-up" part of Ridge regression with glmnet
gives different coefficients than what I compute by "textbook definition"?
这篇关于与使用lm求解法线方程可得出不同的系数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!