我想将Tukey.HSD事后测试的结果添加到ggplot2箱线图中。 This SO answer包含我想要的手动示例(即,手动添加情节上的字母;共享字母的组是无法区分的,p>随便什么)。

是否存在基于AOV和Tukey HSD事后分析的自动功能,将类似这些的字母添加到箱线图中?

我认为编写这样的功能并不难。它看起来像这样:

set.seed(0)
lev <- gl(3, 10)
y <- c(rnorm(10), rnorm(10) + 0.1, rnorm(10) + 3)
d <- data.frame(lev=lev, y=y)

p_base <- ggplot(d, aes(x=lev, y=y)) + geom_boxplot()

a <- aov(y~lev, data=d)
tHSD <- TukeyHSD(a)

# Function to generate a data frame of factor levels and corresponding labels
generate_label_df <- function(HSD, factor_levels) {
  comparisons <- rownames(HSD$l)
  p.vals <- HSD$l[ , "p adj"]

  ## Somehow create a vector of letters
  labels <- # A vector of letters, one for each factor level, generated using `comparisons` and `p.vals`
  letter_df <- data.frame(lev=factor_levels, labels=labels)
  letter_df
}

# Add the labels to the plot
p_base +
  geom_text(data=generate_label_df(tHSD), aes(x=l, y=0, label=labels))

我意识到TukeyHSD对象具有plot方法,并且还有另一个程序包(我现在似乎找不到),该程序包可以执行我在基本图形中描述的操作,但是我真的更喜欢在ggplot2中执行此操作。

最佳答案

您可以使用'multcompView'包中的'multcompLetters'在Tukey HSD测试后生成同源基团的字母。从那里,提取与Tukey HSD中测试的每个因子相对应的组标签,以及在箱图中显示的较高分位数,以便将标签放置在此水平之上。

library(plyr)
library(ggplot2)
library(multcompView)

set.seed(0)
lev <- gl(3, 10)
y <- c(rnorm(10), rnorm(10) + 0.1, rnorm(10) + 3)
d <- data.frame(lev=lev, y=y)

a <- aov(y~lev, data=d)
tHSD <- TukeyHSD(a, ordered = FALSE, conf.level = 0.95)

generate_label_df <- function(HSD, flev){
 # Extract labels and factor levels from Tukey post-hoc
 Tukey.levels <- HSD[[flev]][,4]
 Tukey.labels <- multcompLetters(Tukey.levels)['Letters']
 plot.labels <- names(Tukey.labels[['Letters']])

 # Get highest quantile for Tukey's 5 number summary and add a bit of space to buffer between
 # upper quantile and label placement
    boxplot.df <- ddply(d, flev, function (x) max(fivenum(x$y)) + 0.2)

 # Create a data frame out of the factor levels and Tukey's homogenous group letters
  plot.levels <- data.frame(plot.labels, labels = Tukey.labels[['Letters']],
     stringsAsFactors = FALSE)

 # Merge it with the labels
   labels.df <- merge(plot.levels, boxplot.df, by.x = 'plot.labels', by.y = flev, sort = FALSE)

return(labels.df)
}

生成ggplot
 p_base <- ggplot(d, aes(x=lev, y=y)) + geom_boxplot() +
  geom_text(data = generate_label_df(tHSD, 'lev'), aes(x = plot.labels, y = V1, label = labels))

09-06 07:26