问题描述
zoo::rollmean
是一个有用的函数,它返回时间序列的滚动平均值;对于长度为 n
和窗口大小为 k
的向量 x
,它返回向量 c(mean(x[1:k]),mean(x[2:(k+1)]), ..., mean(x[(n-k+1):n]))
.
zoo::rollmean
is a helpful function that returns the rolling mean of a time series; for vector x
of length n
and window size k
it returns the vector c(mean(x[1:k]), mean(x[2:(k+1)]), ..., mean(x[(n-k+1):n]))
.
我注意到我正在开发的一些代码似乎运行缓慢,所以我使用 Rcpp 包和一个简单的 for 循环编写了自己的版本:
I noticed that it seemed to be running slowly for some code I was developing, so I wrote my own version using the Rcpp package and a simple for loop:
library(Rcpp)
cppFunction("NumericVector rmRcpp(NumericVector dat, const int window) {
const int n = dat.size();
NumericVector ret(n-window+1);
double summed = 0.0;
for (int i=0; i < window; ++i) {
summed += dat[i];
}
ret[0] = summed / window;
for (int i=window; i < n; ++i) {
summed += dat[i] - dat[i-window];
ret[i-window+1] = summed / window;
}
return ret;
}")
令我惊讶的是,这个版本的函数比 zoo::rollmean
函数快得多:
To my surprise, this version of the function is much faster than the zoo::rollmean
function:
# Time series with 1000 elements
set.seed(144)
y <- rnorm(1000)
x <- 1:1000
library(zoo)
zoo.dat <- zoo(y, x)
# Make sure our function works
all.equal(as.numeric(rollmean(zoo.dat, 3)), rmRcpp(y, 3))
# [1] TRUE
# Benchmark
library(microbenchmark)
microbenchmark(rollmean(zoo.dat, 3), rmRcpp(y, 3))
# Unit: microseconds
# expr min lq mean median uq max neval
# rollmean(zoo.dat, 3) 685.494 904.7525 1776.88666 1229.2475 1744.0720 15724.321 100
# rmRcpp(y, 3) 6.638 12.5865 46.41735 19.7245 27.4715 2418.709 100
即使对于更大的向量也能保持加速:
The speedup holds even for much larger vectors:
# Time series with 5 million elements
set.seed(144)
y <- rnorm(5000000)
x <- 1:5000000
library(zoo)
zoo.dat <- zoo(y, x)
# Make sure our function works
all.equal(as.numeric(rollmean(zoo.dat, 3)), rmRcpp(y, 3))
# [1] TRUE
# Benchmark
library(microbenchmark)
microbenchmark(rollmean(zoo.dat, 3), rmRcpp(y, 3), times=10)
# Unit: milliseconds
# expr min lq mean median uq max
# rollmean(zoo.dat, 3) 2825.01622 3090.84353 3191.87945 3206.00357 3318.98129 3616.14047
# rmRcpp(y, 3) 31.03014 39.13862 42.67216 41.55567 46.35191 53.01875
为什么一个简单的 Rcpp
实现的运行速度比 zoo::rollmean
快 100 倍?
Why does a simple Rcpp
implementation run ~100x faster than zoo::rollmean
?
推荐答案
感谢 @DirkEddelbuettel 指出问题中的比较并不是最公平的,因为我将 C++ 代码与纯 R 代码进行了比较.下面是一个简单的基本 R 实现(没有来自 zoo 包的所有检查);这与 zoo::rollmean
如何实现滚动均值的核心计算非常相似:
Thanks to @DirkEddelbuettel for pointing out that the comparison made in the question wasn't the most fair because I was comparing C++ code to pure R code. The following is a simple base R implementation (without all the checks from the zoo package); this is quite similar to how zoo::rollmean
implements the core computation for the rolling mean:
baseR.rollmean <- function(dat, window) {
n <- length(dat)
y <- dat[window:n] - dat[c(1, 1:(n-window))]
y[1] <- sum(dat[1:window])
return(cumsum(y) / window)
}
与 zoo:rollmean
相比,我们看到这仍然要快很多:
Comparing to zoo:rollmean
, we see that this is still a good deal faster:
set.seed(144)
y <- rnorm(1000000)
x <- 1:1000000
library(zoo)
zoo.dat <- zoo(y, x)
all.equal(as.numeric(rollmean(zoo.dat, 3)), baseR.rollmean(y, 3), RcppRoll::roll_mean(y, 3), rmRcpp(y, 3))
# [1] TRUE
library(microbenchmark)
microbenchmark(rollmean(zoo.dat, 3), baseR.rollmean(y, 3), RcppRoll::roll_mean(y, 3), rmRcpp(y, 3), times=10)
# Unit: milliseconds
# expr min lq mean median uq max neval
# rollmean(zoo.dat, 3) 507.124679 516.671897 646.813716 563.897005 593.861499 1220.08272 10
# baseR.rollmean(y, 3) 46.156480 47.804786 53.923974 49.250144 55.061844 76.47908 10
# RcppRoll::roll_mean(y, 3) 7.714032 8.513042 9.014886 8.693255 8.885514 11.32817 10
# rmRcpp(y, 3) 7.729959 8.045270 8.924030 8.388931 8.996384 12.49042 10
为了深入了解为什么我们在使用 base R 时看到了 10 倍的加速,我使用了 Hadley 的 lineprof 工具,在需要的地方从 zoo
包源中抓取源代码:
To delve into why we were seeing a 10x speedup while using base R, I used Hadley's lineprof tool, grabbing source code from the zoo
package source where needed:
lineprof(rollmean.zoo(zoo.dat, 3))
# time alloc release dups ref src
# 1 0.001 0.954 0 26 #27 rollmean.zoo/unclass
# 2 0.001 0.954 0 0 #28 rollmean.zoo/:
# 3 0.002 0.954 0 1 #28 rollmean.zoo
# 4 0.001 1.431 0 0 #28 rollmean.zoo/seq_len
# 5 0.001 0.000 0 0 #28 rollmean.zoo/c
# 6 0.006 2.386 0 1 #28 rollmean.zoo
# 7 0.002 0.954 0 2 #31 rollmean.zoo/cumsum
# 8 0.001 0.000 0 0 #31 rollmean.zoo//
# 9 0.005 1.912 0 1 #33 rollmean.zoo
# 10 0.013 2.898 0 14 #33 rollmean.zoo/[<-
# 11 0.299 28.941 0 127 #34 rollmean.zoo/na.fill
很明显,几乎所有的时间都花在了 na.fill
函数上,该函数实际上是在已经计算出滚动平均值之后调用的.
Clearly almost all the time is being spent in the na.fill
function, which is actually called after the rolling mean values have already been computed.
lineprof(na.fill.zoo(zoo.dat, fill=NA, 2:999999))
# time alloc release dups ref src
# 1 0.004 1.913 0 39 #26 na.fill.zoo/seq
# 2 0.002 1.921 0 9 #33 na.fill.zoo/coredata
# 3 0.002 1.921 0 6 #37 na.fill.zoo/[<-
# 4 0.001 0.955 0 10 #46 na.fill.zoo
# 5 0.008 3.838 0 19 #46 na.fill.zoo/[<-
# 6 0.003 0.959 0 2 #52 na.fill.zoo
# 7 0.006 0.972 0 21 #52 na.fill.zoo/[<-
# 8 0.001 0.486 0 0 #57 na.fill.zoo/seq_len
# 9 0.005 0.959 0 6 #66 na.fill.zoo
# 10 0.124 11.573 0 34 #66 na.fill.zoo/[
几乎所有的时间都花在对 zoo
对象进行子集化上:
Almost all the time is being spent subsetting the zoo
object:
lineprof("[.zoo"(zoo.dat, 2:999999))
# time alloc release dups ref src
# 1 0.004 0.004 0 0 character(0)
# 2 0.002 1.922 0 4 #4 [.zoo/coredata
# 3 0.038 11.082 0 29 #19 [.zoo/zoo
# 4 0.004 0.000 0 1 #28 [.zoo
几乎所有的时间子集都花在使用 zoo
函数构建一个新的动物园对象上:
Almost all the time subsetting is spent constructing a new zoo object with the zoo
function:
lineprof(zoo(y[2:999999], 2:999999))
# time alloc release dups ref src
# 1 0.021 4.395 0 8 c("zoo", "unique") zoo/unique
# 2 0.012 0.477 0 8 c("zoo", "ORDER") zoo/ORDER
# 3 0.001 0.477 0 1 "zoo" zoo
# 4 0.001 0.954 0 0 c("zoo", ":") zoo/:
# 5 0.015 3.341 0 5 "zoo" zoo
设置新动物园对象所需的各种操作(例如,确定唯一的时间点并对其进行排序).
Various operations needed to setup a new zoo object (e.g. determining unique time points and ordering them).
总而言之,zoo
包似乎通过构造一个新的zoo 对象而不是使用当前zoo 对象的内部结构为其滚动均值操作增加了很多开销;与基本 R 实现相比,这会造成 10 倍的速度下降,与 Rcpp 实现相比,速度会下降 100 倍.
In conclusion, the zoo
package appears to have added a lot of overhead to its rolling mean operations by constructing a new zoo object instead of using the internals of the current zoo object; this creates a 10x slowdown compared to a base R implementation and a 100x slowdown compared to an Rcpp implementation.
这篇关于与简单的 Rcpp 实现相比,为什么 zoo::rollmean 慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!