何使用相关性或协方差矩阵而不是使用R的数据框获得回归系数和模型拟

何使用相关性或协方差矩阵而不是使用R的数据框获得回归系数和模型拟

本文介绍了如何使用相关性或协方差矩阵而不是使用R的数据框获得回归系数和模型拟合?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望能够通过提供相关性或协方差矩阵而不是data.frame从多元线性回归中回归系数.我意识到您会丢失一些与确定截距有关的信息,但是,即使相关矩阵也应该足以获取标准化的系数和方差估计值.

I want to be able to regression coefficients from multiple linear regression by supplying a correlation or covariance matrix instead of a data.frame. I realise you lose some information relevant to determining the intercept and so on, but it should even the correlation matrix should be sufficient for getting standardised coefficients and estimates of variance explained.

例如,如果您有以下数据

So for example, if you had the following data

# get some data
library(MASS)
data("Cars93")
x <- Cars93[,c("EngineSize", "Horsepower", "RPM")]

您可以按如下所示进行回归:

You could run a regression as follows:

lm(EngineSize ~ Horsepower + RPM, x)

但是如果没有数据,而是拥有相关矩阵或协方差矩阵,该怎么办?

but what if instead of having data you had the correlation matrix or the covariance matrix:

corx <- cor(x)
covx <- cov(x)

  • R中的哪个函数允许您基于相关性或协方差矩阵进行回归?理想情况下,它应该类似于 lm ,以便您可以轻松获得r平方,调整后的r平方,预测值等内容.大概,对于其中的某些事情,您还需要提供样本量以及可能的均值向量.但这也可以.
    • What function in R allows you to run a regression based on the correlation or covariance matrix? Ideally it should be similar to lm so that you can easily obtain things like r-squared, adjusted r-squared, predicted values and so on. Presumably, for some of these things, you would need to also provide the sample size and possibly a vector of means. But that would also be fine.
    • 即,类似:

      lm(EngineSize ~ Horsepower + RPM, cov = covx) # obviously this doesn't work
      

      请注意,此答案位于Stats.SE 提供了为什么它可能的理论解释,并提供了一些用于计算系数的自定义 R 代码的示例?

      Note that this answer on Stats.SE provides a theoretical explanation for why it's possible, and provides an example of some custom R code for calculating coefficients?

      推荐答案

      使用lavaan,您可以执行以下操作:

      Using lavaan you could do the following:

      library(MASS)
      data("Cars93")
      x <- Cars93[,c("EngineSize", "Horsepower", "RPM")]
      
      lav.input<- cov(x)
      lav.mean <- colMeans(x)
      
      library(lavaan)
      m1 <- 'EngineSize ~ Horsepower+RPM'
      fit <- sem(m1, sample.cov = lav.input,sample.nobs = nrow(x), meanstructure = TRUE, sample.mean = lav.mean)
      summary(fit, standardize=TRUE)
      

      结果是:

      Regressions:
                         Estimate    Std.Err  Z-value  P(>|z|)   Std.lv    Std.all
        EngineSize ~
          Horsepower          0.015    0.001   19.889    0.000      0.015    0.753
          RPM                -0.001    0.000  -15.197    0.000     -0.001   -0.576
      
      Intercepts:
                        Estimate    Std.Err  Z-value  P(>|z|)   Std.lv    Std.all
         EngineSize          5.805    0.362   16.022    0.000      5.805    5.627
      
      Variances:
                        Estimate    Std.Err  Z-value  P(>|z|)   Std.lv    Std.all
          EngineSize          0.142    0.021    6.819    0.000      0.142    0.133
      

      这篇关于如何使用相关性或协方差矩阵而不是使用R的数据框获得回归系数和模型拟合?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 17:12