我想将Tukey.HSD事后测试的结果添加到ggplot2
箱线图中。 This SO answer包含我想要的手动示例(即,手动添加情节上的字母;共享字母的组是无法区分的,p>随便什么)。
是否存在基于AOV和Tukey HSD事后分析的自动功能,将类似这些的字母添加到箱线图中?
我认为编写这样的功能并不难。它看起来像这样:
set.seed(0)
lev <- gl(3, 10)
y <- c(rnorm(10), rnorm(10) + 0.1, rnorm(10) + 3)
d <- data.frame(lev=lev, y=y)
p_base <- ggplot(d, aes(x=lev, y=y)) + geom_boxplot()
a <- aov(y~lev, data=d)
tHSD <- TukeyHSD(a)
# Function to generate a data frame of factor levels and corresponding labels
generate_label_df <- function(HSD, factor_levels) {
comparisons <- rownames(HSD$l)
p.vals <- HSD$l[ , "p adj"]
## Somehow create a vector of letters
labels <- # A vector of letters, one for each factor level, generated using `comparisons` and `p.vals`
letter_df <- data.frame(lev=factor_levels, labels=labels)
letter_df
}
# Add the labels to the plot
p_base +
geom_text(data=generate_label_df(tHSD), aes(x=l, y=0, label=labels))
我意识到
TukeyHSD
对象具有plot
方法,并且还有另一个程序包(我现在似乎找不到),该程序包可以执行我在基本图形中描述的操作,但是我真的更喜欢在ggplot2
中执行此操作。 最佳答案
您可以使用'multcompView'包中的'multcompLetters'在Tukey HSD测试后生成同源基团的字母。从那里,提取与Tukey HSD中测试的每个因子相对应的组标签,以及在箱图中显示的较高分位数,以便将标签放置在此水平之上。
library(plyr)
library(ggplot2)
library(multcompView)
set.seed(0)
lev <- gl(3, 10)
y <- c(rnorm(10), rnorm(10) + 0.1, rnorm(10) + 3)
d <- data.frame(lev=lev, y=y)
a <- aov(y~lev, data=d)
tHSD <- TukeyHSD(a, ordered = FALSE, conf.level = 0.95)
generate_label_df <- function(HSD, flev){
# Extract labels and factor levels from Tukey post-hoc
Tukey.levels <- HSD[[flev]][,4]
Tukey.labels <- multcompLetters(Tukey.levels)['Letters']
plot.labels <- names(Tukey.labels[['Letters']])
# Get highest quantile for Tukey's 5 number summary and add a bit of space to buffer between
# upper quantile and label placement
boxplot.df <- ddply(d, flev, function (x) max(fivenum(x$y)) + 0.2)
# Create a data frame out of the factor levels and Tukey's homogenous group letters
plot.levels <- data.frame(plot.labels, labels = Tukey.labels[['Letters']],
stringsAsFactors = FALSE)
# Merge it with the labels
labels.df <- merge(plot.levels, boxplot.df, by.x = 'plot.labels', by.y = flev, sort = FALSE)
return(labels.df)
}
生成ggplot
p_base <- ggplot(d, aes(x=lev, y=y)) + geom_boxplot() +
geom_text(data = generate_label_df(tHSD, 'lev'), aes(x = plot.labels, y = V1, label = labels))