问题描述
我正在尝试对数据帧的多个子集运行双向方差分析,而不必实际对数据进行子集化,因为这是低效的
I am trying to run a two-way ANOVA on multiple subsets of a data frame without having to actually subset the data as this is in-efficient
示例数据:
DF<-structure(list(Sample = c(666L, 676L, 686L, 667L, 677L, 687L,
822L, 832L, 842L, 824L, 834L, 844L), Time = c(300L, 300L, 300L,
300L, 300L, 300L, 400L, 400L, 400L, 400L, 400L, 400L), Ploidy = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("2n",
"3n"), class = "factor"), Tissue = c("muscle", "muscle", "muscle",
"liver", "liver", "liver", "intestine", "intestine", "intestine",
"gill", "gill", "gill"), X.lipid = c(1.1, 0.8, 1.3, 3.7, 3.9,
3.8, 5.2, 3.4, 6, 7.6, 10.4, 6.7), l.dec = c(0.011, 0.008, 0.013,
0.037, 0.039, 0.038, 0.052, 0.034, 0.06, 0.076, 0.104, 0.067),
l.arc = c(0.105074124512229, 0.0895624074394449, 0.114266036973812,
0.193560218793138, 0.19879088899975, 0.196192082631721, 0.230059118691331,
0.185452088760136, 0.247467063170448, 0.279298057669285,
0.328359182374352, 0.261824790465914)), .Names = c("Sample",
"Time", "Ploidy", "Tissue", "X.lipid", "l.dec", "l.arc"), row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 69L, 70L, 71L, 72L, 73L, 74L), class = "data.frame")
遇到类似的例子:Anova, for loop to apply function 和多个响应的方差分析,按多个组不部分公式的
我可以接近,但我不相信这是正确的,因为它使用的是 aov,而不是 anova
I can get close but I do not believe this is correct as it uses aov, as opposed to anova
x<- unique(DF$Tissue)
sapply(x, function(my) {
f <- as.formula(paste("l.dec~Time*Ploidy"))
aov(f, data=DF)
}, simplify=FALSE)
如果我将 aov 切换为 anova,它会返回一条错误消息:
If i switch aov for anova, it returns an error message:
Error in UseMethod("anova") :
no applicable method for 'anova' applied to an object of class "formula"
绕远了但正确的是如下:
Long way around but which is CORRECT is as follows:
#Subset by each Tissue type (just one here for e.g.)
muscle<- subset (DF, Tissue == "muscle")
#Perform Anova
anova(lm(l.dec ~ Ploidy * Time, data = muscle))
然而,在主数据框中,我有许多组织类型,并希望避免执行此子集.
However In the main data frame I have many tissue types and want to avoid performing this subset.
我相信应用公式很接近,但在最后阶段需要帮助.
I believe the apply formula is close but need help on the final stages.
推荐答案
基于@user20650 和我上面的评论,我建议首先使用 sapply
和 lm
来生成您的模型列表,然后在该列表上再次使用 sapply
来生成您的方差分析表.这样,您就可以使用模型列表,以便您获得系数、拟合值、残差等.
Building on @user20650 and my comments above, I would suggest first using sapply
with lm
to generate your list of models, and then use sapply
again on that list to generate your ANOVA tables. That way the list of models will be available to you so you can get coefficients, fitted values, residuals etc etc.
x <- unique(DF$Tissue)
models <- sapply(x, function(my) {
lm(l.dec ~ Time * Ploidy, data=DF, Tissue==my)
}, simplify=FALSE)
ANOVA.tables <- sapply(models, anova, simplify=FALSE)
这篇关于在 R 中的多个子集上正确使用 sapply 和 Anova的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!