与简单的 Rcpp 实现相比，为什么 zoo::rollmean 慢?

本文介绍了与简单的 Rcpp 实现相比，为什么 zoo::rollmean 慢?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

zoo::rollmean 是一个有用的函数，它返回时间序列的滚动平均值；对于长度为 n 和窗口大小为 k 的向量 x，它返回向量 c(mean(x[1:k]),mean(x[2:(k+1)]), ..., mean(x[(n-k+1):n])).

zoo::rollmean is a helpful function that returns the rolling mean of a time series; for vector x of length n and window size k it returns the vector c(mean(x[1:k]), mean(x[2:(k+1)]), ..., mean(x[(n-k+1):n])).

我注意到我正在开发的一些代码似乎运行缓慢，所以我使用 Rcpp 包和一个简单的 for 循环编写了自己的版本:

I noticed that it seemed to be running slowly for some code I was developing, so I wrote my own version using the Rcpp package and a simple for loop:

library(Rcpp)
cppFunction("NumericVector rmRcpp(NumericVector dat, const int window) {
  const int n = dat.size();
  NumericVector ret(n-window+1);
  double summed = 0.0;
  for (int i=0; i < window; ++i) {
    summed += dat[i];
  }
  ret[0] = summed / window;
  for (int i=window; i < n; ++i) {
    summed += dat[i] - dat[i-window];
    ret[i-window+1] = summed / window;
  }
  return ret;
}")

令我惊讶的是，这个版本的函数比 zoo::rollmean 函数快得多:

To my surprise, this version of the function is much faster than the zoo::rollmean function:

# Time series with 1000 elements
set.seed(144)
y <- rnorm(1000)
x <- 1:1000
library(zoo)
zoo.dat <- zoo(y, x)

# Make sure our function works
all.equal(as.numeric(rollmean(zoo.dat, 3)), rmRcpp(y, 3))
# [1] TRUE

# Benchmark
library(microbenchmark)
microbenchmark(rollmean(zoo.dat, 3), rmRcpp(y, 3))
# Unit: microseconds
#                  expr     min       lq       mean    median        uq       max neval
#  rollmean(zoo.dat, 3) 685.494 904.7525 1776.88666 1229.2475 1744.0720 15724.321   100
#          rmRcpp(y, 3)   6.638  12.5865   46.41735   19.7245   27.4715  2418.709   100

即使对于更大的向量也能保持加速:

The speedup holds even for much larger vectors:

# Time series with 5 million elements
set.seed(144)
y <- rnorm(5000000)
x <- 1:5000000
library(zoo)
zoo.dat <- zoo(y, x)

# Make sure our function works
all.equal(as.numeric(rollmean(zoo.dat, 3)), rmRcpp(y, 3))
# [1] TRUE

# Benchmark
library(microbenchmark)
microbenchmark(rollmean(zoo.dat, 3), rmRcpp(y, 3), times=10)
# Unit: milliseconds
#                  expr        min         lq       mean     median         uq        max
#  rollmean(zoo.dat, 3) 2825.01622 3090.84353 3191.87945 3206.00357 3318.98129 3616.14047
#          rmRcpp(y, 3)   31.03014   39.13862   42.67216   41.55567   46.35191   53.01875

为什么一个简单的 Rcpp 实现的运行速度比 zoo::rollmean 快 100 倍?

Why does a simple Rcpp implementation run ~100x faster than zoo::rollmean?

推荐答案

感谢 @DirkEddelbuettel 指出问题中的比较并不是最公平的，因为我将 C++ 代码与纯 R 代码进行了比较.下面是一个简单的基本 R 实现(没有来自 zoo 包的所有检查)；这与 zoo::rollmean 如何实现滚动均值的核心计算非常相似:

Thanks to @DirkEddelbuettel for pointing out that the comparison made in the question wasn't the most fair because I was comparing C++ code to pure R code. The following is a simple base R implementation (without all the checks from the zoo package); this is quite similar to how zoo::rollmean implements the core computation for the rolling mean:

baseR.rollmean <- function(dat, window) {
  n <- length(dat)
  y <- dat[window:n] - dat[c(1, 1:(n-window))]
  y[1] <- sum(dat[1:window])
  return(cumsum(y) / window)
}

与 zoo:rollmean 相比，我们看到这仍然要快很多:

Comparing to zoo:rollmean, we see that this is still a good deal faster:

set.seed(144)
y <- rnorm(1000000)
x <- 1:1000000
library(zoo)
zoo.dat <- zoo(y, x)
all.equal(as.numeric(rollmean(zoo.dat, 3)), baseR.rollmean(y, 3), RcppRoll::roll_mean(y, 3), rmRcpp(y, 3))
# [1] TRUE
library(microbenchmark)
microbenchmark(rollmean(zoo.dat, 3), baseR.rollmean(y, 3), RcppRoll::roll_mean(y, 3), rmRcpp(y, 3), times=10)
# Unit: milliseconds
#                       expr        min         lq       mean     median         uq        max neval
#       rollmean(zoo.dat, 3) 507.124679 516.671897 646.813716 563.897005 593.861499 1220.08272    10
#       baseR.rollmean(y, 3)  46.156480  47.804786  53.923974  49.250144  55.061844   76.47908    10
#  RcppRoll::roll_mean(y, 3)   7.714032   8.513042   9.014886   8.693255   8.885514   11.32817    10
#               rmRcpp(y, 3)   7.729959   8.045270   8.924030   8.388931   8.996384   12.49042    10

为了深入了解为什么我们在使用 base R 时看到了 10 倍的加速，我使用了 Hadley 的 lineprof 工具，在需要的地方从 zoo 包源中抓取源代码:

To delve into why we were seeing a 10x speedup while using base R, I used Hadley's lineprof tool, grabbing source code from the zoo package source where needed:

lineprof(rollmean.zoo(zoo.dat, 3))
#     time  alloc release dups ref                  src
# 1  0.001  0.954       0   26 #27 rollmean.zoo/unclass
# 2  0.001  0.954       0    0 #28 rollmean.zoo/:      
# 3  0.002  0.954       0    1 #28 rollmean.zoo        
# 4  0.001  1.431       0    0 #28 rollmean.zoo/seq_len
# 5  0.001  0.000       0    0 #28 rollmean.zoo/c      
# 6  0.006  2.386       0    1 #28 rollmean.zoo        
# 7  0.002  0.954       0    2 #31 rollmean.zoo/cumsum 
# 8  0.001  0.000       0    0 #31 rollmean.zoo//      
# 9  0.005  1.912       0    1 #33 rollmean.zoo        
# 10 0.013  2.898       0   14 #33 rollmean.zoo/[<-    
# 11 0.299 28.941       0  127 #34 rollmean.zoo/na.fill

很明显，几乎所有的时间都花在了 na.fill 函数上，该函数实际上是在已经计算出滚动平均值之后调用的.

Clearly almost all the time is being spent in the na.fill function, which is actually called after the rolling mean values have already been computed.

lineprof(na.fill.zoo(zoo.dat, fill=NA, 2:999999))
#     time  alloc release dups ref                  src
# 1  0.004  1.913       0   39 #26 na.fill.zoo/seq     
# 2  0.002  1.921       0    9 #33 na.fill.zoo/coredata
# 3  0.002  1.921       0    6 #37 na.fill.zoo/[<-     
# 4  0.001  0.955       0   10 #46 na.fill.zoo         
# 5  0.008  3.838       0   19 #46 na.fill.zoo/[<-     
# 6  0.003  0.959       0    2 #52 na.fill.zoo         
# 7  0.006  0.972       0   21 #52 na.fill.zoo/[<-     
# 8  0.001  0.486       0    0 #57 na.fill.zoo/seq_len 
# 9  0.005  0.959       0    6 #66 na.fill.zoo         
# 10 0.124 11.573       0   34 #66 na.fill.zoo/[

几乎所有的时间都花在对 zoo 对象进行子集化上:

Almost all the time is being spent subsetting the zoo object:

lineprof("[.zoo"(zoo.dat, 2:999999))
#    time  alloc release dups          ref            src
# 1 0.004  0.004       0    0 character(0)               
# 2 0.002  1.922       0    4           #4 [.zoo/coredata
# 3 0.038 11.082       0   29          #19 [.zoo/zoo     
# 4 0.004  0.000       0    1          #28 [.zoo

几乎所有的时间子集都花在使用 zoo 函数构建一个新的动物园对象上:

Almost all the time subsetting is spent constructing a new zoo object with the zoo function:

lineprof(zoo(y[2:999999], 2:999999))
#    time alloc release dups                ref        src
# 1 0.021 4.395       0    8 c("zoo", "unique") zoo/unique
# 2 0.012 0.477       0    8  c("zoo", "ORDER") zoo/ORDER 
# 3 0.001 0.477       0    1              "zoo" zoo       
# 4 0.001 0.954       0    0      c("zoo", ":") zoo/:     
# 5 0.015 3.341       0    5              "zoo" zoo

设置新动物园对象所需的各种操作(例如，确定唯一的时间点并对其进行排序).

Various operations needed to setup a new zoo object (e.g. determining unique time points and ordering them).

总而言之，zoo 包似乎通过构造一个新的zoo 对象而不是使用当前zoo 对象的内部结构为其滚动均值操作增加了很多开销；与基本 R 实现相比，这会造成 10 倍的速度下降，与 Rcpp 实现相比，速度会下降 100 倍.

In conclusion, the zoo package appears to have added a lot of overhead to its rolling mean operations by constructing a new zoo object instead of using the internals of the current zoo object; this creates a 10x slowdown compared to a base R implementation and a 100x slowdown compared to an Rcpp implementation.

这篇关于与简单的 Rcpp 实现相比，为什么 zoo::rollmean 慢?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！