问题描述
我是 R 的新手,我想使用 *apply
函数改进以下脚本(我已经阅读了关于 apply
,但我无法使用它).我想对多个自变量(数据框中的列)使用 lm
函数.我用过
I am new to R and I want to improve the following script with an *apply
function (I have read about apply
, but I couldn't manage to use it). I want to use lm
function on multiple independent variables (which are columns in a data frame). I used
for (i in (1:3) {
assign(paste0('lm.',names(data[i])), lm(formula=formula(i),data=data))
}
Formula(i)
定义为
formula=function(x)
{
as.formula ( paste(names(data[x]),'~', paste0(names(data[-1:-3]), collapse = '+')), env=parent.frame() )
}
谢谢.
推荐答案
如果我没猜错的话,您正在使用这样的数据集:
If I don't get you wrong, you are working with a dataset like this:
set.seed(0)
dat <- data.frame(y1 = rnorm(30), y2 = rnorm(30), y3 = rnorm(30),
x1 = rnorm(30), x2 = rnorm(30), x3 = rnorm(30))
x1
、x2
和 x3
是协变量,y1
、y2
, y3
是三个独立的响应.您正在尝试拟合三个线性模型:
x1
, x2
and x3
are covariates, and y1
, y2
, y3
are three independent response. You are trying to fit three linear models:
y1 ~ x1 + x2 + x3
y2 ~ x1 + x2 + x3
y3 ~ x1 + x2 + x3
目前您正在使用通过 y1
、y2
、y3
的循环,每次拟合一个模型.您希望通过将 for
循环替换为 lapply
来加快进程.
Currently you are using a loop through y1
, y2
, y3
, fitting one model per time. You hope to speed the process up by replacing the for
loop with lapply
.
你在错误的轨道上. lm()
是一个昂贵的操作.只要你的数据集不小,for
循环的开销就可以忽略不计.用 lapply
替换 for
循环不会带来性能提升.
You are on the wrong track. lm()
is an expensive operation. As long as your dataset is not small, the costs of for
loop is negligible. Replacing for
loop with lapply
gives no performance gains.
由于所有三个模型都具有相同的 RHS(~
的右侧),因此三个模型的模型矩阵相同.因此,所有模型的 QR 分解只需要进行一次.lm
允许这样做,您可以使用:
Since you have the same RHS (right hand side of ~
) for all three models, model matrix is the same for three models. Therefore, QR factorization for all models need only be done once. lm
allows this, and you can use:
fit <- lm(cbind(y1, y2, y3) ~ x1 + x2 + x3, data = dat)
#Coefficients:
# y1 y2 y3
#(Intercept) -0.081155 0.042049 0.007261
#x1 -0.037556 0.181407 -0.070109
#x2 -0.334067 0.223742 0.015100
#x3 0.057861 -0.075975 -0.099762
如果你检查str(fit)
,你会发现这不是三个线性模型的列表;相反,它是具有单个 $qr
对象的单个线性模型,但具有多个 LHS.所以 $coefficients
、$residuals
和 $fitted.values
是矩阵.所得线性模型具有额外的mlm"值.除了通常的lm"之外的类班级.我创建了一个特殊的 mlm 标签,收集了一些关于主题,由其标签维基总结.
If you check str(fit)
, you will see that this is not a list of three linear models; instead, it is a single linear model with a single $qr
object, but with multiple LHS. So $coefficients
, $residuals
and $fitted.values
are matrices. The resulting linear model has an additional "mlm" class besides the usual "lm" class. I created a special mlm tag collecting some questions on the theme, summarized by its tag wiki.
如果你有更多的协变量,你可以避免使用 输入或粘贴公式.
:
If you have a lot more covariates, you can avoid typing or pasting formula by using .
:
fit <- lm(cbind(y1, y2, y3) ~ ., data = dat)
#Coefficients:
# y1 y2 y3
#(Intercept) -0.081155 0.042049 0.007261
#x1 -0.037556 0.181407 -0.070109
#x2 -0.334067 0.223742 0.015100
#x3 0.057861 -0.075975 -0.099762
注意:不要写
y1 + y2 + y3 ~ x1 + x2 + x3
这会将 y = y1 + y2 + y3
视为单个响应.使用 cbind()
.
This will treat y = y1 + y2 + y3
as a single response. Use cbind()
.
我对概括感兴趣.我有一个数据框 df
,其中第一个 n
列是因变量 (y1,y2,y3,....)
和下一个 m
列是自变量 (x1+x2+x3+....)
.对于 n = 3
和 m = 3
它是 fit .但是如何通过使用
df
的结构自动执行此操作.我的意思是类似于 (for i in (1:n)) fit <- lm(cbind(df[something] ~ df[something], data = dat))
.那个东西"我用 paste
和 paste0
创建了它.谢谢.
所以您正在编写您的公式,或者想要在循环中动态生成/构建模型公式.有很多方法可以做到这一点,许多 Stack Overflow 问题都与此有关.通常有两种方法:
So you are programming your formula, or want to dynamically generate / construct model formulae in the loop. There are many ways to do this, and many Stack Overflow questions are about this. There are commonly two approaches:
- 使用
reformulate
; - 使用
paste
/paste0
和formula
/as.formula
.
我更喜欢reformulate
,因为它的整洁,但是,它不支持公式中的多个LHS.如果你想改造 LHS 也需要一些特殊的处理.所以在下面我将使用 paste
解决方案.
I prefer to reformulate
for its neatness, however, it does not support multiple LHS in the formula. It also needs some special treatment if you want to transform the LHS. So In the following I would use paste
solution.
对于你的数据框df
,你可以做
For you data frame df
, you may do
paste0("cbind(", paste(names(df)[1:n], collapse = ", "), ")", " ~ .")
更漂亮的方法是使用sprintf
和toString
来构建LHS:
A more nice-looking way is to use sprintf
and toString
to construct the LHS:
sprintf("cbind(%s) ~ .", toString(names(df)[1:n]))
这是一个使用 iris
数据集的例子:
Here is an example using iris
dataset:
string_formula <- sprintf("cbind(%s) ~ .", toString(names(iris)[1:2]))
# "cbind(Sepal.Length, Sepal.Width) ~ ."
您可以将此字符串公式传递给lm
,因为lm
会自动将其强制转换为公式类.或者您可以使用 formula
(或 as.formula
)自己进行强制转换:
You can pass this string formula to lm
, as lm
will automatically coerce it into formula class. Or you may do the coercion yourself using formula
(or as.formula
):
formula(string_formula)
# cbind(Sepal.Length, Sepal.Width) ~ .
备注:
R 核心的其他地方也支持这种多 LHS 公式:
This multiple LHS formula is also supported elsewhere in R core:
- 函数
aggregate
的公式方法; - 使用
aov
进行方差分析.
这篇关于拟合具有多个 LHS 的线性模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!