本文介绍了使用QR分解的R中的多元回归分析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个使用QR分解解决多元回归的函数.输入:y向量和X矩阵;输出:b,e,R ^ 2.到目前为止,我已经明白了这一点,并被深深地卡住了.我想我已经使一切变得太复杂了:

I am trying to write a function for solving multiple regression using QR decomposition. Input: y vector and X matrix; output: b, e, R^2. So far I`ve got this and am terribly stuck; I think I have made everything way too complicated:

QR.regression <- function(y, X) {
X <- as.matrix(X)
y <- as.vector(y)
p <- as.integer(ncol(X))
if (is.na(p)) stop("ncol(X) is invalid")
n <- as.integer(nrow(X))
if (is.na(n)) stop("nrow(X) is invalid")
nr <- length(y)
nc <- NCOL(X)

# Householder
for (j in seq_len(nc)) {
id <- seq.int(j, nr)
sigma <- sum(X[id, j]^2)
s <- sqrt(sigma)
diag_ej <- X[j, j]
gamma <- 1.0 / (sigma + abs(s * diag_ej))
kappa <- if (diag_ej < 0) s else -s
X[j,j] <- X[j, j] - kappa
if (j < nc)
for (k in seq.int(j+1, nc)) {
yPrime <- sum(X[id,j] * X[id,k]) * gamma
X[id,k] <- X[id,k] - X[id,j] * yPrime
}
yPrime <- sum(X[id,j] * y[id]) * gamma
y[id] <- y[id] - X[id,j] * yPrime
X[j,j] <- kappa
} # end of Householder transformation

rss <- sum(y[seq.int(nc+1, nr)]^2)  # residuals sum of squares
e <- rss/nr
e <- mean(residuals(QR.regression)^2)
beta <- solve(t(X) %*% X, t(X) %*% y)
for (i in seq_len(ncol(X))) # set zeros in the lower triangular side of X
X[seq.int(i+1, nr),i] <- 0
Rsq <- (X[1:nc,1:nc])^2
return(list(Rsq=Rsq, y = y, beta = beta, e = e))
}


UPDATE:
my.QR <- function(y, X) {
X <- as.matrix(X)
y <- as.vector(y)
p <- as.integer(ncol(X))
if (is.na(p)) stop("ncol(X) is invalid")
n <- as.integer(nrow(X))
if (is.na(n)) stop("nrow(X) is invalid")
qr.X <- qr(X)
b <- solve(t(X) %*% X, t(X) %*% y)
e <- as.vector(y - X %*% beta) #e
R2 <- (X[1:p, 1:p])^2
return(list(b = b, e= e, R2 = R2 ))
}

X <- matrix(c(1,2,3,4,5,6), nrow = 2, ncol = 3)
y <- c(1,2,3,4)
my.QR(X, y)

推荐答案

这完全取决于您允许使用R的内置功能来解决此问题的数量.我已经知道不允许lm,所以这里是故事的其余部分.

It all depends on how much of R's built-in facility you are allowed to use to solve this problem. I already know that lm is not allowed, so here is the rest of the story.

如果允许您使用lm

If you are allowed to use any other routines than lm

然后,您可以简单地使用lm.fit.lm.fitlsfit进行基于QR的普通最小二乘法求解.

Then you can simply use lm.fit, .lm.fit or lsfit for QR-based ordinary least squares solving.

lm.fit(X, y)
.lm.fit(X, y)
lsfit(X, y, intercept = FALSE)

其中,.lm.fit是最轻的,而lm.fitlsfit非常相似.以下是我们可以通过.lm.fit进行的操作:

Among those, .lm.fit is the most light-weighed, while lm.fit and lsfit are pretty similar. The following is what we can do via .lm.fit:

f1 <- function (X, y) {
  z <- .lm.fit(X, y)
  RSS <- crossprod(z$residuals)[1]
  TSS <- crossprod(y - mean(y))[1]
  R2 <- 1 - RSS / TSS
  list(coefficients = z$coefficients, residuals = z$residuals, R2 = R2)
  }

在同班同学的问题中:通过奇异值分解来求解普通最小二乘的玩具R函数,已经使用它来检查SVD方法的正确性.

In the question by your fellow classmate: Toy R function for solving ordinary least squares by singular value decomposition, I have already used this to check the correctness of SVD approach.

如果不允许您使用R的内置QR因式分解程序qr.default

If you are not allowed to use R's built-in QR factorization routine qr.default

如果不允许使用.lm.fit,但是允许使用qr.default,那么它也不那么复杂.

If .lm.fit is not allowed, but qr.default is, then it is also not that complicated.

f2 <- function (X, y) {
  ## QR factorization `X = QR`
  QR <- qr.default(X)
  ## After rotation of `X` and `y`, solve upper triangular system `Rb = Q'y`
  b <- backsolve(QR$qr, qr.qty(QR, y))
  ## residuals
  e <- as.numeric(y - X %*% b)
  ## R-squared
  RSS <- crossprod(e)[1]
  TSS <- crossprod(y - mean(y))[1]
  R2 <- 1 - RSS / TSS
  ## multiple return
  list(coefficients = b, residuals = e, R2 = R2)
  }

如果您进一步想要估计系数的方差-协方差,请遵循如何使用R中的QR分解计算最小二乘估计量的方差? /a>.

If you further want variance-covariance of estimated coefficients, follow How to calculate variance of least squares estimator using QR decomposition in R?.

如果甚至不允许您使用qr.default

If you are not even allowed to use qr.default

然后,我们必须自己编写QR分解. 在R代码中编写Householder QR因式分解功能就是这样.

Then we have to write QR decomposition ourselves. Writing a Householder QR factorization function in R code is giving this.

在其中使用功能myqr,我们可以编写

Using the function myqr there, we can write

f3 <- function (X, y) {
  ## our own QR factorization
  ## complete Q factor is not required
  QR <- myqr(X, complete = FALSE)
  Q <- QR$Q
  R <- QR$R
  ## rotation of `y`
  Qty <- as.numeric(crossprod(Q, y))
  ## solving upper triangular system
  b <- backsolve(R, Qty)
  ## residuals
  e <- as.numeric(y - X %*% b)
  ## R-squared
  RSS <- crossprod(e)[1]
  TSS <- crossprod(y - mean(y))[1]
  R2 <- 1 - RSS / TSS
  ## multiple return
  list(coefficients = b, residuals = e, R2 = R2)
  }

f3并不是非常有效,因为我们明确地形成了Q,即使它是瘦Q因素.原则上,我们应该将yX的QR因式分解一起旋转,这样就不必形成Q.

f3 is not extremely efficient, as we have formed Q explicitly, even though it is the thin-Q factor. In principle, we should rotate y along with the QR factorization of X, thus Q needs not be formed.

如果要修复现有代码

这需要一些调试工作,因此需要一些时间.稍后我将对此做出另一个答案.

This requires some debugging effort so would take some time. I will make another answer regarding this later.

这篇关于使用QR分解的R中的多元回归分析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 09:06
查看更多