本文介绍了R ggplot2:具有wilcoxon显着性水平和方面的箱子图。只显示与星号的重要比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 为了完整起见,我修改了接受的答案并定制了结果图,但我仍然面临一些重要问题。 $总而言之,我正在做箱形图反映Kruskal-Wallis的意义和成对Wilcoxon测试比较。 我想替换p值数字用星号表示,只显示重要的比较结果,将垂直间距减小到最大值。 基本上我想要做 现在颜色问题变得更加突出,饰面高度不均匀,也可以使用多余的小平面文字。 我被困在这一点,所以希望有任何帮助。很抱歉,这个问题很长,但我认为它已经差不多了!谢谢!! 解决方案您可以尝试以下操作。由于你的代码真的很忙,对我来说太复杂了,我建议采用不同的方法。我试图避免循环,并尽可能地使用 tidyverse 。因此,首先我创建了你的数据。然后计算kruskal wallis测试,因为这在 ggsignif 中是不可能的。之后,我将使用 geom_signif 来绘制所有p.values。最后,微不足道的将被删除,并增加一个步骤。 1-使着色工作完成 2-显示星号而不是数字完成 ...以及获胜: 制作一个常见的图例完成 将Kruskal-Wallis线放在最上面完成后,我将值放在底部 5-更改标题和y轴文本的大小(和对齐方式)完成 library(tidyverse) library(ggsignif) #1。您的数据 set.seed(2) df< - as.tbl(iris)%>% mutate(treatment = rep(c(A,B)) (key,value,-Species,-treatment)%>% mutate(value = rnorm(n())) %>% mutate(key = factor(key,levels = unique(key)))%>% mutate(both = interaction(treatment,key,sep =)) #2.克鲁斯卡尔测试 KW < - df%>% group_by(种类)%>%汇总(p = round(kruskal.test(value〜both)$ p.value,2),y = $($)= min(value),x = 1)%>% mutate(y = min(y)) #3.地块 P ggplot(aes(x = both,y = value))+ geom_boxplot(aes(fill =物种))+ facet_grid(〜物种)+ ylim(-3,7)+ theme(axis.text.x = element_text(angle = 45,hjust = 1))+ geom_signif(comparisons = combn(levels(df $ both), 2,简化= F), map_signif_level = T)+ stat_summary(fun.y = mean,geom =point,shape = 5,size = 4)+ xlab )+ geom_text(data = KW,aes(x,y = y,label = paste0(KW p =,p)),hjust = 0)+ ggtitle(Plot) + ylab(这是我自己的y实验室) #4.删除不重要的值并添加步骤增加 P_new P_new $数据[[2]] %过滤器(注释!=NS。)%>% group_by(PANEL)%> % mutate( index =(as.numeric(group [drop = T]) - 1)* 0.5)%>% mutate(y = y + index, yend = yend + index)%>% select(-index)%>% as.data.frame()#最终情节 plot(ggplot_gtable(P_new)) 和类似的方法使用两个方面 #-------------------- #5. Kruskal KW % group_by(物种,处理)%>%汇总(p = round(kruskal.test(value 〜)$ p.value,2),y = min(value),x = 1)%>% ungroup()%>% mutate(y = min(y)) #6.有两个方面的情节 P ggplot(aes(x = key, y = value))+ geom_boxplot(aes(fill =物种))+ facet_grid(处理〜物种)+ ylim(-5,7)+ 他们e(axis.text.x = element_text(angle = 45,hjust = 1))+ geom_signif(比较= combn(水平(df $ key),2,简化= F), map_signif_level = T)+ stat_summary(fun.y = mean,geom =point,shape = 5,size = 4)+ xlab()+ geom_text(data = KW, aes(x,y = y,label = paste0(KW p =,p)),hjust = 0)+ ggtitle(Plot)+ ylab(这是我自己的y实验室) #7.删除不重要的值并添加步骤增加 P_new P_new $ data [[2]] % filter(annotation!=NS。)%>% group_by(PANEL)%>% mutate(index =(as.numeric (group [drop = T]) - 1)* 0.5)%>% mutate(y = y + index, yend = yend + index)%>% select指数)%>% as.data.frame()#最终情节 plot(ggplot_gtable(P_new)) 编辑。 关于您的 p.adjust 需求,您可以自行设置一个函数并直接在函数内调用它 geom_signif()。 wilcox.test.BH.adjusted< - 函数(x,y,n){ tmp< -wilcox.test(x,y) tmp $ p.value< - p.adjust(tmp $ p.value,n = n,method =BH) tmp } geom_signif(comparisons = combn(levels(df $ both),2,simplified = F), map_signif_level = T,test =wilcox.test.BH.adjusted, test.args = list(n = 8)) 面临的挑战是要知道最终会有多少独立测试。然后你可以自己设置 n 。在这里我使用了 8 。但这可能是错误的。 Following up on this question and for the sake of completeness, I modified the accepted answer and customized the resulting plot, but I am still facing some important problems.To sum up, I am doing boxplots reflecting significance of Kruskal-Wallis and pairwise Wilcoxon test comparisons.I want to replace the p-value numbers with asterisks, and show only the significant comparisons, reducing vertical spacing to the max.Basically I want to do this, but with the added problem of facets, that messes everything up.So far I have worked on a very decent MWE, but it still shows problems...library(reshape2)library(ggplot2)library(gridExtra)library(tidyverse)library(data.table)library(ggsignif)library(RColorBrewer)data(iris)iris$treatment <- rep(c("A","B"), length(iris$Species)/2)mydf <- melt(iris, measure.vars=names(iris)[1:4])mydf$treatment <- as.factor(mydf$treatment)mydf$variable <- factor(mydf$variable, levels=sort(levels(mydf$variable)))mydf$both <- factor(paste(mydf$treatment, mydf$variable), levels=(unique(paste(mydf$treatment, mydf$variable))))# Change data to reduce number of statistically significant differencesset.seed(2)mydf <- mydf %>% mutate(value=rnorm(nrow(mydf)))####FIRST TEST BOTH#Kruskal-Wallisaddkw <- as.data.frame(mydf %>% group_by(Species) %>% summarize(p.value = kruskal.test(value ~ both)$p.value))#addkw$p.adjust <- p.adjust(addkw$p.value, "BH")a <- combn(levels(mydf$both), 2, simplify = FALSE)#new p.valuespv.final <- data.frame()for (gr in unique(mydf$Species)){ for (i in 1:length(a)){ tis <- a[[i]] #variable pair to test as <- subset(mydf, Species==gr & both %in% tis) pv <- wilcox.test(value ~ both, data=as)$p.value ddd <- data.table(as) asm <- as.data.frame(ddd[, list(value=mean(value)), by=list(both=both)]) asm2 <- dcast(asm, .~both, value.var="value")[,-1] pf <- data.frame(group1=paste(tis[1], gr), group2=paste(tis[2], gr), mean.group1=asm2[,1], mean.group2=asm2[,2], FC.1over2=asm2[,1]/asm2[,2], p.value=pv) pv.final <- rbind(pv.final, pf) }}#pv.final$p.adjust <- p.adjust(pv.final$p.value, method="BH")pv.final$map.signif <- ifelse(pv.final$p.value > 0.05, "", ifelse(pv.final$p.value > 0.01,"*", "**"))cols <- colorRampPalette(brewer.pal(length(unique(mydf$Species)), "Set1"))myPal <- cols(length(unique(mydf$Species)))#Function to get a list of plots to use as "facets" with grid.arrangeplot.list=function(mydf, pv.final, addkw, a, myPal){ mylist <- list() i <- 0 for (sp in unique(mydf$Species)){ i <- i+1 mydf0 <- subset(mydf, Species==sp) addkw0 <- subset(addkw, Species==sp) pv.final0 <- pv.final[grep(sp, pv.final$group1), ] num.signif <- sum(pv.final0$p.value <= 0.05) P <- ggplot(mydf0,aes(x=both, y=value)) + geom_boxplot(aes(fill=Species)) + stat_summary(fun.y=mean, geom="point", shape=5, size=4) + facet_grid(~Species, scales="free", space="free_x") + scale_fill_manual(values=myPal[i]) + #WHY IS COLOR IGNORED? geom_text(data=addkw0, hjust=0, size=4.5, aes(x=0, y=round(max(mydf0$value, na.rm=TRUE)+0.5), label=paste0("KW p=",p.value))) + geom_signif(test="wilcox.test", comparisons = a[which(pv.final0$p.value<=0.05)],#I can use "a"here map_signif_level = F, vjust=0, textsize=4, size=0.5, step_increase = 0.05) if (i==1){ P <- P + theme(legend.position="none", axis.text.x=element_text(size=20, angle=90, hjust=1), axis.text.y=element_text(size=20), axis.title=element_blank(), strip.text.x=element_text(size=20,face="bold"), strip.text.y=element_text(size=20,face="bold")) } else{ P <- P + theme(legend.position="none", axis.text.x=element_text(size=20, angle=90, hjust=1), axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.title=element_blank(), strip.text.x=element_text(size=20,face="bold"), strip.text.y=element_text(size=20,face="bold")) } #WHY USING THE CODE BELOW TO CHANGE NUMBERS TO ASTERISKS I GET ERRORS? #P2 <- ggplot_build(P) #P2$data[[3]]$annotation <- rep(subset(pv.final0, p.value<=0.05)$map.signif, each=3) #P <- plot(ggplot_gtable(P2)) mylist[[sp]] <- list(num.signif, P) } return(mylist)}p.list <- plot.list(mydf, pv.final, addkw, a, myPal)y.rng <- range(mydf$value)# Get the highest number of significant p-values across all three "facets"height.factor <- 0.3max.signif <- max(sapply(p.list, function(x) x[[1]]))# Lay out the three plots as facets (one for each Species), but adjust so that y-range is same for each facet. Top of y-range is adjusted using max_signif.png(filename="test.png", height=800, width=1200)grid.arrange(grobs=lapply(p.list, function(x) x[[2]] + scale_y_continuous(limits=c(y.rng[1], y.rng[2] + height.factor*max.signif))), ncol=length(unique(mydf$Species)), top="Random title", left="Value") #HOW TO CHANGE THE SIZE OF THE TITLE AND THE Y AXIS TEXT? #HOW TO ADD A COMMON LEGEND?dev.off()It produces the following plot:As you can see there are some problems, most obviously:1- Coloring does not work for some reason2- I do not seem to be able to change the annotation with the asterisksI want something more like this (mockup):So we need to:1- Make coloring work2- Show asterisks instead of numbers...and for the win:3- Make a common legend4- Place Kruskal-Wallis line on top5- Change the size (and alignment) of the title and y axis textIMPORTANT NOTESI would appreciate my code is left as intact as possible even if it isn't the prettiest, cause I still have to make use of intermediate objects like "CNb" or "pv.final".The solution should be easily transferable to other cases; please consider testing "variable" alone, instead of "both"... In this case we have 6 "facets" (vertically and horizontally) and everything gets even more screwed up...I made this other MWE:##NOW TEST MEASURE, TO GET VERTICAL AND HORIZONTAL FACETSaddkw <- as.data.frame(mydf %>% group_by(treatment, Species) %>% summarize(p.value = kruskal.test(value ~ variable)$p.value))#addkw$p.adjust <- p.adjust(addkw$p.value, "BH")a <- combn(levels(mydf$variable), 2, simplify = FALSE)#new p.valuespv.final <- data.frame()for (tr in levels(mydf$treatment)){ for (gr in levels(mydf$Species)){ for (i in 1:length(a)){ tis <- a[[i]] #variable pair to test as <- subset(mydf, treatment==tr & Species==gr & variable %in% tis) pv <- wilcox.test(value ~ variable, data=as)$p.value ddd <- data.table(as) asm <- as.data.frame(ddd[, list(value=mean(value, na.rm=T)), by=list(variable=variable)]) asm2 <- dcast(asm, .~variable, value.var="value")[,-1] pf <- data.frame(group1=paste(tis[1], gr, tr), group2=paste(tis[2], gr, tr), mean.group1=asm2[,1], mean.group2=asm2[,2], FC.1over2=asm2[,1]/asm2[,2], p.value=pv) pv.final <- rbind(pv.final, pf) } }}#pv.final$p.adjust <- p.adjust(pv.final$p.value, method="BH")# set signif levelpv.final$map.signif <- ifelse(pv.final$p.value > 0.05, "", ifelse(pv.final$p.value > 0.01,"*", "**"))plot.list2=function(mydf, pv.final, addkw, a, myPal){ mylist <- list() i <- 0 for (sp in unique(mydf$Species)){ for (tr in unique(mydf$treatment)){ i <- i+1 mydf0 <- subset(mydf, Species==sp & treatment==tr) addkw0 <- subset(addkw, Species==sp & treatment==tr) pv.final0 <- pv.final[grep(paste(sp,tr), pv.final$group1), ] num.signif <- sum(pv.final0$p.value <= 0.05) P <- ggplot(mydf0,aes(x=variable, y=value)) + geom_boxplot(aes(fill=Species)) + stat_summary(fun.y=mean, geom="point", shape=5, size=4) + facet_grid(treatment~Species, scales="free", space="free_x") + scale_fill_manual(values=myPal[i]) + #WHY IS COLOR IGNORED? geom_text(data=addkw0, hjust=0, size=4.5, aes(x=0, y=round(max(mydf0$value, na.rm=TRUE)+0.5), label=paste0("KW p=",p.value))) + geom_signif(test="wilcox.test", comparisons = a[which(pv.final0$p.value<=0.05)],#I can use "a"here map_signif_level = F, vjust=0, textsize=4, size=0.5, step_increase = 0.05) if (i==1){ P <- P + theme(legend.position="none", axis.text.x=element_blank(), axis.text.y=element_text(size=20), axis.title=element_blank(), axis.ticks.x=element_blank(), strip.text.x=element_text(size=20,face="bold"), strip.text.y=element_text(size=20,face="bold")) } if (i==4){ P <- P + theme(legend.position="none", axis.text.x=element_text(size=20, angle=90, hjust=1), axis.text.y=element_text(size=20), axis.title=element_blank(), strip.text.x=element_text(size=20,face="bold"), strip.text.y=element_text(size=20,face="bold")) } if ((i==2)|(i==3)){ P <- P + theme(legend.position="none", axis.text.x=element_blank(), axis.text.y=element_blank(), axis.title=element_blank(), axis.ticks.x=element_blank(), axis.ticks.y=element_blank(), strip.text.x=element_text(size=20,face="bold"), strip.text.y=element_text(size=20,face="bold")) } if ((i==5)|(i==6)){ P <- P + theme(legend.position="none", axis.text.x=element_text(size=20, angle=90, hjust=1), axis.text.y=element_blank(), #axis.ticks.y=element_blank(), #WHY SPECIFYING THIS GIVES ERROR? axis.title=element_blank(), axis.ticks.y=element_blank(), strip.text.x=element_text(size=20,face="bold"), strip.text.y=element_text(size=20,face="bold")) } #WHY USING THE CODE BELOW TO CHANGE NUMBERS TO ASTERISKS I GET ERRORS? #P2 <- ggplot_build(P) #P2$data[[3]]$annotation <- rep(subset(pv.final0, p.value<=0.05)$map.signif, each=3) #P <- plot(ggplot_gtable(P2)) sptr <- paste(sp,tr) mylist[[sptr]] <- list(num.signif, P) } } return(mylist)}p.list2 <- plot.list2(mydf, pv.final, addkw, a, myPal)y.rng <- range(mydf$value)# Get the highest number of significant p-values across all three "facets"height.factor <- 0.5max.signif <- max(sapply(p.list2, function(x) x[[1]]))# Lay out the three plots as facets (one for each Species), but adjust so that y-range is same for each facet. Top of y-range is adjusted using max_signif.png(filename="test2.png", height=800, width=1200)grid.arrange(grobs=lapply(p.list2, function(x) x[[2]] + scale_y_continuous(limits=c(y.rng[1], y.rng[2] + height.factor*max.signif))), ncol=length(unique(mydf$Species)), top="Random title", left="Value") #HOW TO CHANGE THE SIZE OF THE TITLE AND THE Y AXIS TEXT? #HOW TO ADD A COMMON LEGEND?dev.off()That produces the following plot:Now the color problem becomes more striking, the facet heights are uneven, and something should be done with the redundant facet strip texts too.I am stuck at this point, so would appreciate any help. Sorry for the long question, but I think it is almost there! Thanks!! 解决方案 You can try following. As your code is really busy and for me too complicated to understand, I suggest a different approach. I tried to avoid loops and to use the tidyverse as much as possible. Thus, first I created your data. Then calculated kruskal wallis tests as this was not possible within ggsignif. Afterwards I will plot all p.values using geom_signif. Finally, insignificant ones will be removed and a step increase is added.1- Make coloring work done2- Show asterisks instead of numbers done...and for the win:3- Make a common legend done4- Place Kruskal-Wallis line on top done, I placed the values at the bottom5- Change the size (and alignment) of the title and y axis text donelibrary(tidyverse)library(ggsignif)# 1. your dataset.seed(2)df <- as.tbl(iris) %>% mutate(treatment=rep(c("A","B"), length(iris$Species)/2)) %>% gather(key, value, -Species, -treatment) %>% mutate(value=rnorm(n())) %>% mutate(key=factor(key, levels=unique(key))) %>% mutate(both=interaction(treatment, key, sep = " "))# 2. Kruskal testKW <- df %>% group_by(Species) %>% summarise(p=round(kruskal.test(value ~ both)$p.value,2), y=min(value), x=1) %>% mutate(y=min(y))# 3. Plot P <- df %>% ggplot(aes(x=both, y=value)) + geom_boxplot(aes(fill=Species)) + facet_grid(~Species) + ylim(-3,7)+ theme(axis.text.x = element_text(angle=45, hjust=1)) + geom_signif(comparisons = combn(levels(df$both),2,simplify = F), map_signif_level = T) + stat_summary(fun.y=mean, geom="point", shape=5, size=4) + xlab("") + geom_text(data=KW,aes(x, y=y, label=paste0("KW p=",p)),hjust=0) + ggtitle("Plot") + ylab("This is my own y-lab")# 4. remove not significant values and add step increaseP_new <- ggplot_build(P)P_new$data[[2]] <- P_new$data[[2]] %>% filter(annotation != "NS.") %>% group_by(PANEL) %>% mutate(index=(as.numeric(group[drop=T])-1)*0.5) %>% mutate(y=y+index, yend=yend+index) %>% select(-index) %>% as.data.frame()# the final plot plot(ggplot_gtable(P_new))and similar approach using two facets # --------------------# 5. KruskalKW <- df %>% group_by(Species, treatment) %>% summarise(p=round(kruskal.test(value ~ both)$p.value,2), y=min(value), x=1) %>% ungroup() %>% mutate(y=min(y))# 6. Plot with two facets P <- df %>% ggplot(aes(x=key, y=value)) + geom_boxplot(aes(fill=Species)) + facet_grid(treatment~Species) + ylim(-5,7)+ theme(axis.text.x = element_text(angle=45, hjust=1)) + geom_signif(comparisons = combn(levels(df$key),2,simplify = F), map_signif_level = T) + stat_summary(fun.y=mean, geom="point", shape=5, size=4) + xlab("") + geom_text(data=KW,aes(x, y=y, label=paste0("KW p=",p)),hjust=0) + ggtitle("Plot") + ylab("This is my own y-lab")# 7. remove not significant values and add step increaseP_new <- ggplot_build(P)P_new$data[[2]] <- P_new$data[[2]] %>% filter(annotation != "NS.") %>% group_by(PANEL) %>% mutate(index=(as.numeric(group[drop=T])-1)*0.5) %>% mutate(y=y+index, yend=yend+index) %>% select(-index) %>% as.data.frame()# the final plot plot(ggplot_gtable(P_new))Edit. Regarding to your p.adjust needs, you can set up a function on your own and calling it directly within geom_signif().wilcox.test.BH.adjusted <- function(x,y,n){ tmp <- wilcox.test(x,y) tmp$p.value <- p.adjust(tmp$p.value, n = n,method = "BH") tmp} geom_signif(comparisons = combn(levels(df$both),2,simplify = F), map_signif_level = T, test = "wilcox.test.BH.adjusted", test.args = list(n=8))The challenge is to know how many independet tests you will have in the end. Then you can set the n by your own. Here I used 8. But this is maybe wrong. 这篇关于R ggplot2:具有wilcoxon显着性水平和方面的箱子图。只显示与星号的重要比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
09-22 07:31