我想在向量中找到位置,该位置的值与向量中较早的点相差超过某个阈值。应相对于矢量中的第一个值测量第一个变化点。随后的变化点应相对于先前的变化点进行测量。
我可以使用for
循环来执行此操作,但我想知道是否存在更惯用且更快的向量化灵魂。
最小示例:
set.seed(123)
x = cumsum(rnorm(500))
mindiff = 5.0
start = x[1]
changepoints = integer()
for (i in 1:length(x)) {
if (abs(x[i] - start) > mindiff) {
changepoints = c(changepoints, i)
start = x[i]
}
}
plot(x, type = 'l')
points(changepoints, x[changepoints], col='red')
最佳答案
在Rcpp
中实现相同的代码可以提高速度。
library(Rcpp)
cppFunction(
"IntegerVector foo(NumericVector vect, double difference){
int start = 0;
IntegerVector changepoints;
for (int i = 0; i < vect.size(); i++){
if((vect[i] - vect[start]) > difference || (vect[start] - vect[i]) > difference){
changepoints.push_back (i+1);
start = i;
}
}
return(changepoints);
}"
)
foo(vect = x, difference = mindiff)
# [1] 17 25 56 98 108 144 288 297 307 312 403 470 487
identical(foo(vect = x, difference = mindiff), changepoints)
#[1] TRUE
基准化
#DATA
set.seed(123)
x = cumsum(rnorm(1e5))
mindiff = 5.0
library(microbenchmark)
microbenchmark(baseR = {start = x[1]
changepoints = integer()
for (i in 1:length(x)) {
if (abs(x[i] - start) > mindiff) {
changepoints = c(changepoints, i)
start = x[i]
}
}}, Rcpp = foo(vect = x, difference = mindiff))
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# baseR 117.194668 123.07353 125.98741 125.56882 127.78463 139.5318 100 b
# Rcpp 7.907011 11.93539 14.47328 12.16848 12.38791 263.2796 100 a
关于r - 在向量中找到变化大于阈值的点,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/45866170/