问题描述
有没有办法使用 rollapply(来自 zoo
包或类似的东西)优化函数(rollmean
、rollmedian
等)来计算滚动具有基于时间的窗口的函数,而不是基于多个观察的函数?我想要的很简单:对于不规则时间序列中的每个元素,我想计算一个具有 N 天窗口的滚动函数.也就是说,该窗口应包括当前观察前 N 天的所有观察.时间序列也可能包含重复项.
下面是一个例子.给定以下时间序列:
日期值2011 年 1 月 11 日 52011 年 1 月 11 日 42011 年 1 月 11 日 22011 年 8 月 11 日 12011 年 11 月 13 日 02011 年 11 月 14 日 02011 年 15 月 11 日 02011 年 11 月 18 日 121/11/2011 42011 年 5 月 12 日 3
具有 5 天窗口的滚动中位数,向右对齐,应进行以下计算:
>C(中位数(c(5)),中位数(c(5,4)),中位数(c(5,4,2)),中位数(c(1)),中位数(c(1,0)),中位数(c(0,0)),中位数(c(0,0,0)),中位数(c(0,0,0,1)),中位数(c(1,4)),中位数(c(3)))[1] 5.0 4.5 4.0 1.0 0.5 0.0 0.0 0.0 2.5 3.0
我已经找到了一些解决方案,但它们通常很棘手,这通常意味着缓慢.我设法实现了自己的滚动函数计算.问题在于,对于很长的时间序列,中位数(rollmedian)的优化版本可能会产生巨大的时间差异,因为它考虑了窗口之间的重叠.我想避免重新实现它.我怀疑rollapply参数有一些技巧可以使它起作用,但我无法弄清楚.提前感谢您的帮助.
截至 v1.9.8 版(CRAN 2016 年 11 月 25 日),data.table 已经获得了执行 non-equi joins 的能力,可以在这里使用.
OP 已请求
对于不规则时间序列中的每个元素,我想计算一个具有 N 天窗口的滚动函数.也就是说,窗口应该包括当前 N 天之前的所有观测值观察.时间序列也可能包含重复项.
请注意,OP 已要求包括在当前观察之前最多 N 天的所有观察.这与请求当前 day 前 N 天的所有观察结果不同.
对于后者,我希望 1/11/2011
的 one 值,即 median(c(5, 4, 2))
= 4.
显然,OP 期望基于 观察 的滚动窗口限制为 N 天.因此,非等连接的连接条件也要考虑行号.
库(data.table)n_days
[1] 5.0 4.5 4.0 1.0 0.5 0.0 0.0 0.0 2.5 3.0
为了完整起见,基于天的滚动窗口可能的解决方案是:
setDT(DT)[.(ud = unique(date), ld = unique(date) - n_days), on = .(date <= ud, date >= ld),中位数(as.double(值)),按 = .EACHI]
date date V11: 2011-11-01 2011-10-27 4.02:2011-11-08 2011-11-03 1.03: 2011-11-13 2011-11-08 0.54: 2011-11-14 2011-11-09 0.05:2011-11-15 2011-11-10 0.06: 2011-11-18 2011-11-13 0.07: 2011-11-21 2011-11-16 2.58: 2011-12-05 2011-11-30 3.0
数据
库(data.table)DT
Is there some way to use rollapply (from zoo
package or something similar) optimized functions (rollmean
, rollmedian
etc) to compute rolling functions with a time-based window, instead of one based on a number of observations? What I want is simple: for each element in an irregular time series, I want to compute a rolling function with a N-days window. That is, the window should include all the observations up to N days before the current observation. Time series may also contain duplicates.
Here follows an example. Given the following time series:
date value
1/11/2011 5
1/11/2011 4
1/11/2011 2
8/11/2011 1
13/11/2011 0
14/11/2011 0
15/11/2011 0
18/11/2011 1
21/11/2011 4
5/12/2011 3
A rolling median with a 5-day window, aligned to the right, should result in the following calculation:
> c(
median(c(5)),
median(c(5,4)),
median(c(5,4,2)),
median(c(1)),
median(c(1,0)),
median(c(0,0)),
median(c(0,0,0)),
median(c(0,0,0,1)),
median(c(1,4)),
median(c(3))
)
[1] 5.0 4.5 4.0 1.0 0.5 0.0 0.0 0.0 2.5 3.0
I already found some solutions out there but they are usually tricky, which usually means slow. I managed to implement my own rolling function calculation. The problem is that for very long time series the optimized version of median (rollmedian) can make a huge time difference, since it takes into account the overlap between windows. I would like to avoid reimplementing it. I suspect there are some trick with rollapply parameters that will make it work, but I cannot figure it out. Thanks in advance for the help.
As of version v1.9.8 (on CRAN 25 Nov 2016), data.table has gained the ability to perform non-equi joins which can be used here.
The OP has requested
Note that the OP has requested to include all the observations up to N days before the current observation. This is different to request all the observations up to N days before the current day.
For the latter, I would expect one value for 1/11/2011
, i.e., median(c(5, 4, 2))
= 4.
Apparently, the OP expects an observation-based rolling window which is limited to N days. Therefore, the join conditions of the non-equi join have to consider the row number as well.
library(data.table)
n_days <- 5L
setDT(DT)[, rn := .I][
.(ur = rn, ud = date, ld = date - n_days),
on = .(rn <= ur, date <= ud, date >= ld),
median(as.double(value)), by = .EACHI]$V1
For the sake of completeness, a possible solution for the day-based rolling window could be:
setDT(DT)[.(ud = unique(date), ld = unique(date) - n_days), on = .(date <= ud, date >= ld),
median(as.double(value)), by = .EACHI]
Data
library(data.table)
DT <- fread(" date value
1/11/2011 5
1/11/2011 4
1/11/2011 2
8/11/2011 1
13/11/2011 0
14/11/2011 0
15/11/2011 0
18/11/2011 1
21/11/2011 4
5/12/2011 3")[
# coerce date from character string to integer date class
, date := as.IDate(date, "%d/%m/%Y")]
这篇关于具有基于时间的窗口的不规则时间序列上的优化滚动函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!