我从五十年代开始看到了这幅伟大的图,它与不同大学的密度图略有重叠。查看this link at fivethirtyeight.com您将如何使用ggplot2复制此图?具体来说,您将如何获得这种微小的重叠,facet_wrap无法正常工作。TestFrame <- data.frame( Score = c(rnorm(100, 0, 1) ,rnorm(100, 0, 2) ,rnorm(100, 0, 3) ,rnorm(100, 0, 4) ,rnorm(100, 0, 5)) ,Group = c(rep('Ones', 100) ,rep('Twos', 100) ,rep('Threes', 100) ,rep('Fours', 100) ,rep('Fives', 100)) )ggplot(TestFrame, aes(x = Score, group = Group)) + geom_density(alpha = .75, fill = 'black') 最佳答案 与ggplot一样,关键是要以正确的格式获取数据,然后进行绘制就非常简单了。我敢肯定还有另一种方法,但是我的方法是用density()进行密度估算,然后用geom_density()制作一种手动的geom_ribbon(),它需要一个ymin和,是将形状移出x轴所必需的。剩下的挑战是要正确地打印顺序,因为ggplot似乎将首先打印最宽的色带。最后,需要最庞大代码的部分是四分位数的产生。我还生成了一些与原始数据更加一致的数据。library(ggplot2)library(dplyr)library(broom)rawdata <- data.frame(Score = rnorm(1000, seq(1, 0, length.out = 10), sd = 1), Group = rep(LETTERS[1:10], 10000))df <- rawdata %>% mutate(GroupNum = rev(as.numeric(Group))) %>% #rev() means the ordering will be from top to bottom group_by(Group, GroupNum) %>% do(tidy(density(.$Score, bw = diff(range(.$Score))/20))) %>% #The original has quite a large bandwidth group_by() %>% mutate(ymin = GroupNum * (max(y) / 1.5), #This constant controls how much overlap between groups there is ymax = y + ymin, ylabel = ymin + min(ymin)/2, xlabel = min(x) - mean(range(x))/2) #This constant controls how far to the left the labels are#Get quartileslabels <- rawdata %>% mutate(GroupNum = rev(as.numeric(Group))) %>% group_by(Group, GroupNum) %>% mutate(q1 = quantile(Score)[2], median = quantile(Score)[3], q3 = quantile(Score)[4]) %>% filter(row_number() == 1) %>% select(-Score) %>% left_join(df) %>% mutate(xmed = x[which.min(abs(x - median))], yminmed = ymin[which.min(abs(x - median))], ymaxmed = ymax[which.min(abs(x - median))]) %>% filter(row_number() == 1)p <- ggplot(df, aes(x, ymin = ymin, ymax = ymax)) + geom_text(data = labels, aes(xlabel, ylabel, label = Group)) +geom_vline(xintercept = 0, size = 1.5, alpha = 0.5, colour = "#626262") + geom_vline(xintercept = c(-2.5, -1.25, 1.25, 2.5), size = 0.75, alpha = 0.25, colour = "#626262") + theme(panel.grid = element_blank(), panel.background = element_rect(fill = "#F0F0F0"), axis.text.y = element_blank(), axis.ticks = element_blank(), axis.title = element_blank())for (i in unique(df$GroupNum)) { p <- p + geom_ribbon(data = df[df$GroupNum == i,], aes(group = GroupNum), colour = "#F0F0F0", fill = "black") + geom_segment(data = labels[labels$GroupNum == i,], aes(x = xmed, xend = xmed, y = yminmed, yend = ymaxmed), colour = "#F0F0F0", linetype = "dashed") + geom_segment(data = labels[labels$GroupNum == i,], x = min(df$x), xend = max(df$x), aes(y = ymin, yend = ymin), size = 1.5, lineend = "round")}p <- p + geom_text(data = labels[labels$Group == "A",], aes(xmed - xlabel/50, ylabel), label = "Median", colour = "#F0F0F0", hjust = 0, fontface = "italic", size = 4)编辑我注意到原始文件实际上是通过用水平线拉伸每个分布来做些麻烦的(如果仔细观察,可以看到一个连接...)。我在循环中添加了与第二个ymax类似的内容。
09-13 06:52