问题描述
我有一个数据集(数据框),其中有5列均包含数字值.
I have a dataset (data frame) with 5 columns all containing numeric values.
我希望对数据集中的每一对进行简单的线性回归.
I'm looking to run a simple linear regression for each pair in the dataset.
例如,如果列名为A, B, C, D, E
,则我要运行lm(A~B), lm(A~C), lm(A~D), ...., lm(D~E)
,...,然后,我要绘制每对数据以及回归线.
For example, If the columns were named A, B, C, D, E
, I want to run lm(A~B), lm(A~C), lm(A~D), ...., lm(D~E)
,... and, then I want to plot the data for each pair along with the regression line.
我对R还是很陌生,所以我对如何实际实现此目标感到有些困惑.我应该使用ddply
吗?还是lapply
?我不确定如何解决这个问题.
I'm pretty new to R so I'm sort of spinning my wheels on how to actually accomplish this. Should I use ddply
? or lapply
? I'm not really sure how to tackle this.
推荐答案
以下是使用combn
combn(names(DF), 2, function(x){lm(DF[, x])}, simplify = FALSE)
示例:
set.seed(1)
DF <- data.frame(A=rnorm(50, 100, 3),
B=rnorm(50, 100, 3),
C=rnorm(50, 100, 3),
D=rnorm(50, 100, 3),
E=rnorm(50, 100, 3))
更新:添加@Henrik建议(请参阅评论)
Updated: adding @Henrik suggestion (see comments)
# only the coefficients
> results <- combn(names(DF), 2, function(x){coefficients(lm(DF[, x]))}, simplify = FALSE)
> vars <- combn(names(DF), 2)
> names(results) <- vars[1 , ] # adding names to identify variables in the reggression
> results
$A
(Intercept) B
103.66739418 -0.03354243
$A
(Intercept) C
97.88341555 0.02429041
$A
(Intercept) D
122.7606103 -0.2240759
$A
(Intercept) E
99.26387487 0.01038445
$B
(Intercept) C
99.971253525 0.003824755
$B
(Intercept) D
102.65399702 -0.02296721
$B
(Intercept) E
96.83042199 0.03524868
$C
(Intercept) D
80.1872211 0.1931079
$C
(Intercept) E
89.0503893 0.1050202
$D
(Intercept) E
107.84384655 -0.07620397
这篇关于从R中的数据框运行多个简单的线性回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!