本文介绍了在geom_histogram或stat_bin上叠加geom_points的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 ggplot 绘制直方图(或使用 stat_bin 的阶梯图)并使用 geom_point 在其上覆盖一些点

I wan to use ggplot to plot a histogram (or a step plot using stat_bin) and overlay a few points on it using geom_point.

这是 base 的实现:

library(plotrix)
set.seed(10)
df <- data.frame(id=LETTERS,val=rnorm(length(LETTERS)))
selected.ids <- sample(LETTERS,3,replace=F)
h <- hist(df$val,plot=F,breaks=10)
cols <- sapply(rainbow(length(selected.ids)),function(x) color.id(x)[1])
selected.df <- data.frame(id=selected.ids,col=cols,stringsAsFactors=F)
selected.df$x <- df$val[which(df$id %in% selected.ids)]
selected.df <- selected.df[order(selected.df$x),]
selected.df$y <- h$counts[findInterval(selected.df$x,h$breaks)]
selected.df$col <- factor(selected.df$col,levels=cols)
plot(h)
segments(x0=selected.df$x,x1=selected.df$x,y0=selected.df$y,y1=selected.df$y,cex=18,lwd=8,col=selected.df$col)

给出:

但是当我尝试 ggplot 时:

ggplot(df,aes(x=val))+geom_histogram(bins=10,colour="black",alpha=0,fill="#FF6666")+geom_point(data=selected.df,aes(x=x,y=y,colour=factor(col)),size=2)+scale_fill_manual(values=levels(selected.df$col),labels=selected.df$id,name="id")+scale_colour_manual(values=levels(selected.df$col),labels=selected.df$id,name="id")

点和直方图未对齐:

理想情况下,我想使用阶梯图来绘制它:

Ideally I would like to plot it using a step plot:

ggplot(df,aes(x=val))+stat_bin(geom="step",bins=10)+geom_point(data=selected.df,aes(x=x,y=y,colour=factor(col)),size=2)+scale_fill_manual(values=levels(selected.df$col),labels=selected.df$id,name="id")+scale_colour_manual(values=levels(selected.df$col),labels=selected.df$id,name="id")

看起来很像 geom_histogram

但是我也想让线的末端接触y = 0线.

but also I'd also like to have the ends of the line touch the y=0 line.

所以我可以使用stat_bin在阶梯图中正确地得到它吗?

So I do I get the correctly in a step plot using the stat_bin?

推荐答案

您的 selected.df y.values breaks 使用 hist(),但 geom_histogram()使用另一个 breaks .(只是为了确保 geom_histogram(bins)不等同于 hist(breaks)).此外,在阶梯图中,上下中断发生在其 breaks 的中间值上. ggplot_build(gg.obj)$ data (或 plot(gg.obj)$ data )为您提供一些信息,中断,计数等.

Your selected.df's y.values is made with breaks hist() uses, but geom_histogram() uses another breaks. (geom_histogram(bins) isn't equivalent to hist(breaks) just to be sure). Additionally, in the step plot, up-down happens on middle values of its breaks. ggplot_build(gg.obj)$data (or plot(gg.obj)$data) gives you some information, breaks, counts, and so on.

geom_histgram
方式基本上与 base.plot 相同.如果您想要与 base.plot 相同的输出,请使用 breaks = h $ breaks 而不是 bars = 10 .

geom_histgram
The way is basically the same as base.plot. If you want the same output as base.plot, please use breaks = h$breaks instead of bars = 10.

# a common part to base and ggplot2
library(plotrix)
set.seed(10)
df <- data.frame(id = LETTERS, val = rnorm(length(LETTERS)))
selected.ids <- sample(LETTERS, 3, replace = F)
cols <- sapply(rainbow(length(selected.ids)), function(x) color.id(x)[1])
selected.df <- data.frame(id=selected.ids, col=cols, stringsAsFactors = F)
selected.df$x <- df$val[which(df$id %in% selected.ids)]
selected.df <- selected.df[order(selected.df$x),]
selected.df$col <- factor(selected.df$col, levels=cols)

# (1) make a histogram
g <- ggplot(df, aes(x = val)) + geom_histogram(bins = 10, colour = "black", alpha = 0, fill = "#FF6666")
  # base; h <- hist(df$val, plot = F, breaks = 10)

# (2) get its breaks
g.data <- ggplot_build(g)$data[[1]]
g.breaks <- c(g.data$xmin, tail(g.data$xmax, n=1))
  # base; h$breaks

# (3) get counts of specific x values
selected.df$y <- g.data$count[findInterval(selected.df$x, g.breaks)]
  # base; selected.df$y <- h$counts[findInterval(selected.df$x,h$breaks)]

# (4) draw
g + geom_point(data = selected.df, aes(x = x, y = y, colour = factor(col)), size = 2) +
  scale_fill_manual(values = levels(selected.df$col), labels = selected.df$id,name = "id") +
  scale_colour_manual(values = levels(selected.df$col), labels = selected.df$id, name = "id")

stat_bin
您可以按照与 geom_histgram 相同的方式绘制它.重要的是,上下变动不是在休息时发生,而是在中间值时发生.

stat_bin
You can draw it in the same way as geom_histgram. The important point is up-down happens not on breaks but middle values.

selected.df2 <- selected.df

# (1) make a step plot
s <- ggplot(df, aes(x = val)) + stat_bin(geom = "step", bins = 10)

# (2) get breaks and its middle values
s.data <- ggplot_build(s)$data[[1]]
s.breaks <- c(s.data$xmin, tail(s.data$xmax, n=1))
s.mid.breaks <- s.data$x

# (3) get counts of specific x values using middle values of breaks.
selected.df2$y <- s.data$count[findInterval(selected.df2$x, s.mid.breaks)]

# (4) add a new levels into breaks to start and end at y=0
s.add.breaks <- c(s.breaks[1] - 1.0E-6,    # making lower levels is easy
                  s.breaks,
                  tail(s.breaks, n=1) + diff(s.breaks[1:2])) # upper need the same range

# (5) draw
ggplot(df, aes(x = val)) + stat_bin(geom = "step", breaks = s.add.breaks) +
  geom_point(data = selected.df2, aes(x = x, y = y, colour = factor(col)), size = 2) +
  scale_fill_manual(values = levels(selected.df2$col), labels = selected.df2$id, name = "id") +
  scale_colour_manual(values = levels(selected.df2$col), labels = selected.df2$id, name="id")

这篇关于在geom_histogram或stat_bin上叠加geom_points的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-28 12:41