本文介绍了R hist vs geom_hist断点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中使用了具有相同断点的geom_hist和直方图,但是得到了不同的图形.我进行了快速搜索,有谁知道定义的中断是什么以及为什么它们会有所不同

I am using both geom_hist and histogram in R with the same breakpoints but I get different graphs. I did a quick search, does anyone know what the definition breaks are and why they would be a difference

它们产生两个不同的情节.

These produce two different plots.

set.seed(25)
data <- data.frame(Mos=rnorm(500, mean = 25, sd = 8))
data$Mos<-round(data$Mos)

pAge <- ggplot(data, aes(x=Mos))
pAge + geom_histogram(breaks=seq(0, 50, by = 2))
hist(data$Mos,breaks=seq(0, 50, by = 2))

谢谢

推荐答案

要在ggplot2中获得相同的直方图,请在scale_x_continuous中指定breaks,在geom_histogram中指定binwidth.

To get the same histogram in ggplot2 you specify the breaks inside scale_x_continuous and binwidth inside geom_histogram.

此外,histggplot2中的直方图使用不同的默认值来创建间隔:

Additionally, hist and histograms in ggplot2 use different defaults to create the intervals:

        **hist**    **ggplot2**
         freq1 Freq   freq2 Freq
    1    (0,2]    0   [0,2)    0
    2    (2,4]    2   [2,4)    2
    3    (4,6]    2   [4,6)    1
    4    (6,8]    1   [6,8)    2
    5   (8,10]    6  [8,10)    2
    6  (10,12]    9 [10,12)    7
    7  (12,14]   24 [12,14)   17
    8  (14,16]   27 [14,16)   26
    9  (16,18]   39 [16,18)   31
    10 (18,20]   48 [18,20)   46
    11 (20,22]   52 [20,22)   43
    12 (22,24]   38 [22,24)   57
    13 (24,26]   44 [24,26)   36
    14 (26,28]   46 [26,28)   52
    15 (28,30]   39 [28,30)   39
    16 (30,32]   31 [30,32)   33
    17 (32,34]   30 [32,34)   26
    18 (34,36]   24 [34,36)   29
    19 (36,38]   18 [36,38)   27
    20 (38,40]    9 [38,40)   12
    21 (40,42]    5 [40,42)    6
    22 (42,44]    4 [42,44)    0
    23 (44,46]    1 [44,46)    5
    24 (46,48]    1 [46,48)    0
    25 (48,50]    0 [48,50)    1

我包括了参数right = FALSE,因此直方图间隔与ggplot2中的一样是左关闭(右打开)的.我在两个图中都添加了标签,因此更容易检查间隔是否相同.

I included the argument right = FALSE so the histogram intervalss are left-closed (right open) as they are in ggplot2. I added the labels in both plots, so it is easier to check the intervals are the same.

ggplot(data, aes(x = Mos))+
  geom_histogram(binwidth = 2, colour = "black", fill = "white")+
  scale_x_continuous(breaks = seq(0, 50, by = 2))+
  stat_bin(binwidth = 2, aes(label=..count..), vjust=-0.5, geom = "text")
hist(data$Mos,breaks=seq(0, 50, by = 2), labels =TRUE, right =FALSE)

要检查每个仓中的频率,请执行以下操作:

To check the frequencies in each bin:

freq <- cut(data$Mos, breaks = seq(0, 50, by = 2), dig.lab = 4, right = FALSE)
as.data.frame(table(frecuencias))

这篇关于R hist vs geom_hist断点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 04:00