本文介绍了间隔为右间隔的findInterval()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

R中出色的findInterval()函数在其vec参数中使用左闭合子间隔,如其文档所示:

The great findInterval() function in R uses left-closed sub-intervals in its vec argument, as shown in its docs:

如果我需要右闭合子间隔,我有什么选择?我想出的最好的方法是:

If I want right-closed sub-intervals, what are my options? The best I've come up with is this:

findInterval.rightClosed <- function(x, vec, ...) {
  fi <- findInterval(x, vec, ...)
  fi - (x==vec[fi])
}

另一个也可以工作:

findInterval.rightClosed2 <- function(x, vec, ...) {
  length(vec) - findInterval(-x, -rev(vec), ...)
}

这是一个小测试:

x <- c(3, 6, 7, 7, 29, 37, 52)
vec <- c(2, 5, 6, 35)
findInterval(x, vec)
# [1] 1 3 3 3 3 4 4
findInterval.rightClosed(x, vec)
# [1] 1 2 3 3 3 4 4
findInterval.rightClosed2(x, vec)
# [1] 1 2 3 3 3 4 4

但是,如果有更好的解决方案,我想看看其他解决方案. 更好",我的意思是某种程度上更令人满意"或感觉不像是混蛋",甚至可能是更有效率的". =)

But I'd like to see any other solutions if there's a better one. By "better", I mean "somehow more satisfying" or "doesn't feel like a kludge" or maybe even "more efficient". =)

(请注意,findInterval()有一个rightmost.closed自变量,但它有所不同-它仅引用最终的子间隔,并且具有不同的含义.)

(Note that there's a rightmost.closed argument to findInterval(), but it's different - it only refers to the final sub-interval and has a different meaning.)

推荐答案

编辑:所有通道中的主要清理工作.

Major clean-up in all aisles.

您可能会看cut.默认情况下,cut设置左打开和右关闭间隔,可以使用适当的参数(right)进行更改.以您的示例为例:

You might look at cut. By default, cut makes left open and right closed intervals, and that can be changed using the appropriate argument (right). To use your example:

x <- c(3, 6, 7, 7, 29, 37, 52)
vec <- c(2, 5, 6, 35)
cutVec <- c(vec, max(x)) # for cut, range of vec should cover all of x

现在创建四个应该执行相同操作的函数:两个来自OP,一个来自Josh O'Brien,然后是cut. cut的两个参数已从默认设置更改:include.lowest = TRUE将在两侧闭合一个最小(最左边)间隔的间隔. labels = FALSE将导致cut仅返回垃圾箱的整数值,而不是创建因子,否则它将这样做.

Now create four functions that should do the same thing: Two from the OP, one from Josh O'Brien, and then cut. Two arguments to cut have been changed from default settings: include.lowest = TRUE will create an interval closed on both sides for the smallest (leftmost) interval. labels = FALSE will cause cut to return simply the integer values for the bins instead of creating a factor, which it otherwise does.

findInterval.rightClosed <- function(x, vec, ...) {
  fi <- findInterval(x, vec, ...)
  fi - (x==vec[fi])
}
findInterval.rightClosed2 <- function(x, vec, ...) {
  length(vec) - findInterval(-x, -rev(vec), ...)
}
cutFun <- function(x, vec){
    cut(x, vec, include.lowest = TRUE, labels = FALSE)
}
# The body of fiFun is a contribution by Josh O'Brien that got fed to the ether.
fiFun <- function(x, vec){
    xxFI <- findInterval(x, vec * (1 + .Machine$double.eps))
}

所有函数都返回相同的结果吗?是的. (请注意,将cutVec用于cutFun)

Do all functions return the same result? Yup. (notice the use of cutVec for cutFun)

mapply(identical, list(findInterval.rightClosed(x, vec)),
  list(findInterval.rightClosed2(x, vec), cutFun(x, cutVec), fiFun(x, vec)))
# [1] TRUE TRUE TRUE

现在要添加一个要求更高的向量:

Now a more demanding vector to bin:

x <- rpois(2e6, 10)
vec <- c(-Inf, quantile(x, seq(.2, 1, .2)))

测试是否相同(请注意使用unname)

Test whether identical (note use of unname)

mapply(identical, list(unname(findInterval.rightClosed(x, vec))),
  list(findInterval.rightClosed2(x, vec), cutFun(x, vec), fiFun(x, vec)))
# [1] TRUE TRUE TRUE

和基准:

library(microbenchmark)
microbenchmark(findInterval.rightClosed(x, vec), findInterval.rightClosed2(x, vec),
  cutFun(x, vec), fiFun(x, vec), times = 50)
# Unit: milliseconds
#                                expr       min        lq    median        uq       max
# 1                    cutFun(x, vec)  35.46261  35.63435  35.81233  36.68036  53.52078
# 2                     fiFun(x, vec)  51.30158  51.69391  52.24277  53.69253  67.09433
# 3  findInterval.rightClosed(x, vec) 124.57110 133.99315 142.06567 155.68592 176.43291
# 4 findInterval.rightClosed2(x, vec)  79.81685  82.01025  86.20182  95.65368 108.51624

从这次运行来看,cut似乎是最快的.

From this run, cut seems to be the fastest.

这篇关于间隔为右间隔的findInterval()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-14 12:24