筛选回归模型中的(多重)共线性

本文介绍了筛选回归模型中的(多重)共线性的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我希望这个问题不会是问答"问题……这里是:(多重)共线性是指回归模型中预测变量之间的极高相关性.如何治愈它们……好吧，有时您不需要治愈"共线性，因为它不会影响回归模型本身，而是对单个预测变量的影响的解释.

I hope that this one is not going to be "ask-and-answer" question... here goes:(multi)collinearity refers to extremely high correlations between predictors in the regression model. How to cure them... well, sometimes you don't need to "cure" collinearity, since it doesn't affect regression model itself, but interpretation of an effect of individual predictors.

发现共线性的一种方法是将每个预测变量作为因变量，将其他预测变量作为自变量，确定 R，如果它大于 0.9(或 0.95)，我们可以考虑预测器冗余.这是一种方法"……其他方法呢?其中一些非常耗时，例如从模型中排除预测变量并观察 b 系数的变化 - 它们应该明显不同.

One way to spot collinearity is to put each predictor as a dependent variable, and other predictors as independent variables, determine R, and if it's larger than .9 (or .95), we can consider predictor redundant. This is one "method"... what about other approaches? Some of them are time consuming, like excluding predictors from model and watching for b-coefficient changes - they should be noticeably different.

当然，我们必须始终牢记分析的具体背景/目标......有时，唯一的补救方法是重复研究，但现在，我对筛选冗余预测变量的各种方法感兴趣，当(多重)共线性出现在回归模型中.

Of course, we must always bear in mind the specific context/goal of the analysis... Sometimes, only remedy is to repeat a research, but right now, I'm interested in various ways of screening redundant predictors when (multi)collinearity occurs in a regression model.

推荐答案

kappa() 函数可以提供帮助.这是一个模拟示例:

The kappa() function can help. Here is a simulated example:

> set.seed(42)
> x1 <- rnorm(100)
> x2 <- rnorm(100)
> x3 <- x1 + 2*x2 + rnorm(100)*0.0001    # so x3 approx a linear comb. of x1+x2
> mm12 <- model.matrix(~ x1 + x2)        # normal model, two indep. regressors
> mm123 <- model.matrix(~ x1 + x2 + x3)  # bad model with near collinearity
> kappa(mm12)                            # a 'low' kappa is good
[1] 1.166029
> kappa(mm123)                           # a 'high' kappa indicates trouble
[1] 121530.7

我们进一步使第三个回归量越来越共线:

and we go further by making the third regressor more and more collinear:

> x4 <- x1 + 2*x2 + rnorm(100)*0.000001  # even more collinear
> mm124 <- model.matrix(~ x1 + x2 + x4)
> kappa(mm124)
[1] 13955982
> x5 <- x1 + 2*x2                        # now x5 is linear comb of x1,x2
> mm125 <- model.matrix(~ x1 + x2 + x5)
> kappa(mm125)
[1] 1.067568e+16
>

这里使用了近似值，详情参见help(kappa).

This used approximations, see help(kappa) for details.

这篇关于筛选回归模型中的(多重)共线性的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！