新 R 用户。我正在尝试根据 this question 中的过程使用 cut 分割基于十分位数的数据集。我想将十分位数值添加为数据框中的新列,但是当我这样做时,出于某种原因,最低值被列为 NA。无论 include.lowest=TRUE 还是 FALSE 都会发生这种情况。任何人都知道为什么?

当我使用这个样本集时也会发生,所以它不是我的数据所独有的。


> decile <- cut(data, quantile(data, (0:10)/10, labels=TRUE, include.lowest=FALSE))

> df <- cbind(data, decile)

> df

      data decile
 [1,]    1     NA
 [2,]    2      1
 [3,]    3      2
 [4,]    4      2
 [5,]    5      3
 [6,]    6      3
 [7,]    7      4
 [8,]    8      4
 [9,]    9      5
[10,]   10      5
[11,]   11      6
[12,]   12      6
[13,]   13      7
[14,]   14      7
[15,]   15      8
[16,]   16      8
[17,]   17      9
[18,]   18      9
[19,]   19     10
[20,]   20     10

最佳答案

有两个问题,首先你的 cut 调用有一些问题。我想你的意思是

cut(data, quantile(data, (0:10)/10), include.lowest=FALSE)
##                                ^missing parenthesis

此外, labels 应该是 FALSENULL 或包含所需标签的 length(breaks) 向量。

其次,主要问题是因为您设置了 include.lowest=FALSE ,而 data[1]1 ,它对应于定义的第一个中断
> quantile(data, (0:10)/10)
  0%  10%  20%  30%  40%  50%  60%  70%  80%  90% 100%
 1.0  2.9  4.8  6.7  8.6 10.5 12.4 14.3 16.2 18.1 20.0
1 值不属于任何类别;它超出了您的休息时间定义的类别的下限。

我不确定您想要什么,但您可以尝试这两种选择之一,具体取决于您希望 1 所在的类:
> cut(data, quantile(data, (0:10)/10), include.lowest=TRUE)
 [1] [1,2.9]     [1,2.9]     (2.9,4.8]   (2.9,4.8]   (4.8,6.7]   (4.8,6.7]
 [7] (6.7,8.6]   (6.7,8.6]   (8.6,10.5]  (8.6,10.5]  (10.5,12.4] (10.5,12.4]
[13] (12.4,14.3] (12.4,14.3] (14.3,16.2] (14.3,16.2] (16.2,18.1] (16.2,18.1]
[19] (18.1,20]   (18.1,20]
10 Levels: [1,2.9] (2.9,4.8] (4.8,6.7] (6.7,8.6] (8.6,10.5] ... (18.1,20]
> cut(data, c(0, quantile(data, (0:10)/10)), include.lowest=FALSE)
 [1] (0,1]       (1,2.9]     (2.9,4.8]   (2.9,4.8]   (4.8,6.7]   (4.8,6.7]
 [7] (6.7,8.6]   (6.7,8.6]   (8.6,10.5]  (8.6,10.5]  (10.5,12.4] (10.5,12.4]
[13] (12.4,14.3] (12.4,14.3] (14.3,16.2] (14.3,16.2] (16.2,18.1] (16.2,18.1]
[19] (18.1,20]   (18.1,20]
11 Levels: (0,1] (1,2.9] (2.9,4.8] (4.8,6.7] (6.7,8.6] ... (18.1,20]

关于使用 cut() 添加十分位数列时接收 NA,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/17932617/

10-12 19:41