我想用ggplot获得子集的子面子均值(x + y轴)。但是,我得到的是数据的平均值,而不是子集的平均值。我不知道如何解决这个问题。

hsb2<-read.table("http://www.ats.ucla.edu/stat/data/hsb2.csv", sep=",", header=T)
head(hsb2)
hsb2$gender = as.factor(hsb2$female)

ggplot() +
  geom_point(aes(y = read,x = write,colour = gender),data=hsb2,size = 2.2,alpha = 0.9) +
  scale_colour_brewer(guide = guide_legend(),palette = 'Set1') +
  stat_smooth(aes(x = write,y = read),data=hsb2,colour = '#000000',size = 0.8,method = lm,formula = 'y ~ x') +
  geom_vline(aes(xintercept = mean(write)),data=hsb2,linetype = 3) +
  geom_hline(aes(yintercept = mean(read)),data=hsb2,linetype = 3) +
  facet_wrap(facets = ~gender)

最佳答案

一种方法是显式计算每种性别的均值(x和y),并将其存储为原始数据框中的新列。当按性别对构面进行拆分时,会在所需的位置绘制线条。
使用轻拍

#compute the read and write means for each gender
read_means <- tapply(hsb2$read, hsb2$gender, mean)
write_means <- tapply(hsb2$write, hsb2$gender, mean)

#store it in the data frame
hsb2$read_mean <- ifelse(hsb2$gender==0, read_means[1], read_means[2])
hsb2$write_mean <- ifelse(hsb2$gender==0, write_means[1], write_means[2])

上面几行的替代方法是使用ddply。
使用Plyr包中的ddply
可以使用一行创建新列。
library(plyr)
ddply(hsb2, "gender", transform,
      read_mean  = mean(read),
      write_mean = mean(write))

现在,将两个新的列均值传递给ggplot中的vline和hline调用。
ggplot() +
  geom_point(aes(y = read,x = write,colour = gender),data=hsb2,size = 2.2,alpha = 0.9) +
  scale_colour_brewer(guide = guide_legend(),palette = 'Set1') +
  stat_smooth(aes(x = write,y = read),data=hsb2,colour = '#000000',
              size = 0.8,method = lm,formula = 'y ~ x') +
  geom_vline(aes(xintercept = write_mean),data=hsb2,linetype = 3) +
  geom_hline(aes(yintercept = read_mean),data=hsb2,linetype = 3) +
  facet_wrap(facets = ~gender)

产生:

关于r - Ggplot2在方面上子集的平均值而不是全局平均值,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/21412097/

10-12 17:54
查看更多