我试图在R中找到下上限的索引。
这是findInterval解决的相同问题,但是findInterval检查它的参数是否已排序,我想避免这种情况,因为我知道它已排序。
我试图直接调用底层的C函数,但是对于应该调用findInterval还是find_interv_vec感到困惑。
另外,我尝试拨打电话,但似乎找不到该功能
findInterval2 <- function (x, vec, rightmost.closed = FALSE, all.inside = TRUE)
{
nx <- length(x)
index <- integer(nx)
.C('find_interv_vec', xt=as.double(vec), n=length(vec),
x=as.double(x), nx=nx, as.logical(rightmost.closed),
as.logical(all.inside), index, DUP = FALSE, NAOK=T,
PACKAGE='base')
index
}
我懂了
Error in .C("find_interv_vec", xt = as.double(vec), n = length(vec), x = as.double(x), :
"find_interv_vec" not available for .C() for package "base"
另一方面,我读到使用.Call比旧的.C更好,特别是因为.C复制并且我的vec很大。我应该如何构造对.Call的调用?
谢谢!
最佳答案
经过研究和@MartinMorgan的非常有用的答案,我决定做与他的答案类似的事情。
我创建了一些模拟findInterval的函数,而不检查vec是否已排序。显然,当x的长度为1且您一遍又一遍地调用它时,这会有很大的不同。如果x的长度>> 1,并且您可以利用 vector 化,则findInterval仅检查一次是否对vec进行了排序。
在以下代码块中,我创建了一些查找间隔的变体
之后,我创建了两个要测试的函数
对于testVec,我创建的所有函数都通过Vectorize在x参数中进行了 vector 化。
在那之后,我用微基准测试为执行计时。
码
require(inline)
# findInterval written in R as a binary search
findInterval2 <- function(x,v) {
n = length(v)
if (x<v[1])
return (0)
if (x>=v[n])
return (n)
i=1
k=n
while({j = (k-i) %/% 2 + i; !(v[j] <= x && x < v[j+1])}) {
if (x < v[j])
k = j
else
i = j+1
}
return (j)
}
findInterval2Vec = Vectorize(findInterval2,vectorize.args="x")
#findInterval2 compilated with cmpfun
findInterval2Comp <- cmpfun(findInterval2)
findInterval2CompVec <- Vectorize(findInterval2Comp,vectorize.args="x")
findInterval2VecComp <- cmpfun(findInterval2Vec)
findInterval2CompVecComp <- cmpfun(findInterval2CompVec)
sig <-signature(x="numeric",v="numeric",n="integer",idx="integer")
code <- "
if (*x < v[0]) {
*idx = -1;
return;
}
if (*x >= v[*n-1]) {
*idx = *n-1;
return;
}
int i,j,k;
i = 0;
k = *n-1;
while (j = (k-i) / 2 + i, !(v[j] <= *x && *x < v[j+1])) {
if (*x < v[j]) {
k = j;
}
else {
i = j+1;
}
}
*idx=j;
return;
"
fn <- cfunction(sig=sig,body=code,language="C",convention=".C")
# findInterval written in C
findIntervalC <- function(x,v) {
idx = as.integer(-1)
as.integer((fn(x,v,length(v),idx)$idx)+1)
}
findIntervalCVec <- Vectorize(findIntervalC,vectorize.args="x")
# The test case where x is of length 1 and you call findInterval several times
testByOne <- function(f,reps = 100, vlength = 300000, xs = NULL) {
if (is.null(xs))
xs = seq(from=1,to=vlength-1,by=vlength/reps)
v = 1:vlength
for (x in xs)
f(x,v)
}
# The test case where you can take advantage of vectorization
testVec <- function(f,reps = 100, vlength = 300000, xs = NULL) {
if (is.null(xs))
xs = seq(from=1,to=vlength-1,by=vlength/reps)
v = 1:vlength
f(xs,v)
}
标杆管理
microbenchmark(fi=testByOne(findInterval),fi2=testByOne(findInterval2),fi2comp=testByOne(findInterval2Comp),fic=testByOne(findIntervalC))
Unit: milliseconds
expr min lq median uq max neval
fi 617.536422 648.19212 659.927784 685.726042 754.12988 100
fi2 11.308138 11.60319 11.734305 12.067857 71.98640 100
fi2comp 2.293874 2.52145 2.637388 5.036558 62.01111 100
fic 368.002442 380.81847 416.137318 424.250337 474.31542 100
microbenchmark(fi=testVec(findInterval),fi2=testVec(findInterval2Vec),fi2compVec=testVec(findInterval2CompVec),fi2vecComp=testVec(findInterval2VecComp),fic=testByOne(findIntervalCVec))
Unit: milliseconds
expr min lq median uq max neval
fi 4.218191 4.986061 6.875732 10.216228 68.51321 100
fi2 12.982914 13.786563 16.738707 19.102777 75.64573 100
fi2compVec 4.264839 4.650925 4.902277 9.892413 13.32756 100
fi2vecComp 13.000124 13.689418 14.072334 18.911659 76.19146 100
fic 840.446529 893.445185 908.549874 919.152187 1047.84978 100
一些观察
关于r - R中下限的快速索引,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/17098124/