问题描述
cut() 的好方法是什么
将量化变量分为多个级别,包括专用于 NA 的最终级别?
What's a good way to cut()
a quantiative variable into levels, including a final level dedicated to NAs?
我更喜欢 tidyverse 函数通常提供的 .missing
参数(eg, dplyr::recode()
& dplyr::if_else()
).
I'd prefer something like the .missing
parameter that tidyverse functions commonly offer(e.g., dplyr::recode()
& dplyr::if_else()
).
如果输入是w
,并且这个假设函数名为cut_with_nas
,那么下面的代码
If the input is w
and this hypothetical function is named cut_with_nas
, then the following code
w <- c(0L, NA_integer_, 22:25, NA_integer_, 40)
cut_with_nas(w, breaks=2)
会产生所需的输出:
[1] (-0.04,20] Unknown (20,40] (20,40] (20,40] (20,40] Unknown (20,40]
Levels: (-0.04,20] (20,40] Unknown
我在下面发布了一个实现此功能的函数,但我希望有一个更简洁的解决方案,或者至少是一个包中已经存在的经过测试的函数.
I'm posting a function below that accomplishes this, but I was hoping there's a more concise solution, or at least a tested function already existing in a package.
推荐答案
cut_with_nas <- function( x, breaks, labels=NULL, .missing="Unknown" ) {
y <- cut(x, breaks, labels) #, include.lowest = T, right=F)
y <- addNA(y)
levels(y)[is.na(levels(y))] <- .missing
return( y )
}
此函数的大部分内容都从三年前@akrun 的响应中大量窃取.
(还有一点来自这个悬而未决的问题.)
The majority of this function steals heavily from a response by @akrun three years ago.
(And a little from this unanswered question too.)
这篇关于cut() 一个有缺失值的变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!