问题描述
我正在尝试对数据帧的多个子集运行双向ANOVA,而不必实际对数据进行子集化,因为这效率不高
I am trying to run a two-way ANOVA on multiple subsets of a data frame without having to actually subset the data as this is in-efficient
示例数据:
DF<-structure(list(Sample = c(666L, 676L, 686L, 667L, 677L, 687L,
822L, 832L, 842L, 824L, 834L, 844L), Time = c(300L, 300L, 300L,
300L, 300L, 300L, 400L, 400L, 400L, 400L, 400L, 400L), Ploidy = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("2n",
"3n"), class = "factor"), Tissue = c("muscle", "muscle", "muscle",
"liver", "liver", "liver", "intestine", "intestine", "intestine",
"gill", "gill", "gill"), X.lipid = c(1.1, 0.8, 1.3, 3.7, 3.9,
3.8, 5.2, 3.4, 6, 7.6, 10.4, 6.7), l.dec = c(0.011, 0.008, 0.013,
0.037, 0.039, 0.038, 0.052, 0.034, 0.06, 0.076, 0.104, 0.067),
l.arc = c(0.105074124512229, 0.0895624074394449, 0.114266036973812,
0.193560218793138, 0.19879088899975, 0.196192082631721, 0.230059118691331,
0.185452088760136, 0.247467063170448, 0.279298057669285,
0.328359182374352, 0.261824790465914)), .Names = c("Sample",
"Time", "Ploidy", "Tissue", "X.lipid", "l.dec", "l.arc"), row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 69L, 70L, 71L, 72L, 73L, 74L), class = "data.frame")
遇到类似的例子: Anova,用于循环以应用函数和对多个响应的方差分析,由多个组不组成公式
我可以接近,但我不认为这是正确的,因为它使用aov而不是anova
I can get close but I do not believe this is correct as it uses aov, as opposed to anova
x<- unique(DF$Tissue)
sapply(x, function(my) {
f <- as.formula(paste("l.dec~Time*Ploidy"))
aov(f, data=DF)
}, simplify=FALSE)
如果我将aov切换为anova,则会返回错误消息:
If i switch aov for anova, it returns an error message:
Error in UseMethod("anova") :
no applicable method for 'anova' applied to an object of class "formula"
很远但是正确的是:
#Subset by each Tissue type (just one here for e.g.)
muscle<- subset (DF, Tissue == "muscle")
#Perform Anova
anova(lm(l.dec ~ Ploidy * Time, data = muscle))
但是,在主数据帧中,我有许多组织类型,并希望避免执行此子集.
However In the main data frame I have many tissue types and want to avoid performing this subset.
我认为申请方法很接近,但是在最后阶段需要帮助.
I believe the apply formula is close but need help on the final stages.
推荐答案
在@ user20650和我上面的评论的基础上,我建议首先将sapply
与lm
一起使用以生成模型列表,然后再使用在该列表上再次以生成ANOVA表.这样一来,您就可以使用模型列表,从而可以获取系数,拟合值,残差等.
Building on @user20650 and my comments above, I would suggest first using sapply
with lm
to generate your list of models, and then use sapply
again on that list to generate your ANOVA tables. That way the list of models will be available to you so you can get coefficients, fitted values, residuals etc etc.
x <- unique(DF$Tissue)
models <- sapply(x, function(my) {
lm(l.dec ~ Time * Ploidy, data=DF, Tissue==my)
}, simplify=FALSE)
ANOVA.tables <- sapply(models, anova, simplify=FALSE)
这篇关于在R中的多个子集上正确使用Anova和sapply的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!