本文介绍了R:计算不规则时间序列的滚动和,这些时间序列由 id 变量分组,具有基于时间的窗口的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我喜欢 R,但有些问题实在是太难了.

I love R but some problems are just plain hard.

挑战是在基于时间的窗口大于或等于 6 小时的不规则时间序列中找到滚动和小于 30 的第一个实例.我有这个系列的样本

The challenge is to find the first instance of a rolling sum that is less than 30 in an irregular time series having a time-based window greater than or equal to 6 hours. I have a sample of the series

Row Person  DateTime    Value
1   A   2014-01-01 08:15:00 5
2   A   2014-01-01 09:15:00 5
3   A   2014-01-01 10:00:00 5
4   A   2014-01-01 11:15:00 5
5   A   2014-01-01 14:15:00 5
6   B   2014-01-01 08:15:00 25
7   B   2014-01-01 10:15:00 25
8   B   2014-01-01 19:15:00 2
9   C   2014-01-01 08:00:00 20
10  C   2014-01-01 09:00:00 5
11  C   2014-01-01 13:45:00 1
12  D   2014-01-01 07:00:00 1
13  D   2014-01-01 08:15:00 13
14  D   2014-01-01 14:15:00 15

For Person A, Rows 1 & 5 create a minimum 6 hour interval with a running sum of 25 (which is less than 30).
For Person B, Rows 7 & 8 create a 9 hour interval with a running sum of 27 (again less than 30).
For Person C, using Rows 9 & 10, there is no minimum 6 hour interval (it is only 5.75 hours) although the running sum is 26 and is less than 30.
For Person D, using Rows 12 & 14, the interval is 7.25 hours but the running sum is 30 and is not less than 30.

给定 n 个观察值,必须比较 n*(n-1)/2 个区间.例如,当 n=2 时,只需要计算 1 个区间.对于 n=3,有 3 个区间.以此类推.

Given n observations, there are n*(n-1)/2 intervals that must be compared. For example, with n=2 there is just 1 interval to evaluate. For n=3 there are 3 intervals. And so on.

我假设这是子集和问题的一种变体(http://en.wikipedia.org/wiki/Subset_sum_problem)

I assume that this is an variation of the subset sum problem (http://en.wikipedia.org/wiki/Subset_sum_problem)

虽然可以对数据进行排序,但我怀疑这需要一个蛮力解决方案来测试每个间隔.

While the data can be sorted I suspect this requires a brute force solution testing each interval.

任何帮助将不胜感激.

这是 DateTime 列格式为 POSIXct 的数据:

here's the data with DateTime column formatted as POSIXct:

df <- structure(list(Person = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 3L, 3L, 3L, 4L, 4L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"),
DateTime = structure(c(1388560500, 1388564100, 1388566800,
1388571300, 1388582100, 1388560500, 1388567700, 1388600100,
1388559600, 1388563200, 1388580300, 1388556000, 1388560500,
1388582100), class = c("POSIXct", "POSIXt"), tzone = ""),
Value = c(5L, 5L, 5L, 5L, 5L, 25L, 25L, 2L, 20L, 5L, 1L,
1L, 13L, 15L)), .Names = c("Person", "DateTime", "Value"), row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14"), class = "data.frame")

推荐答案

我发现这也是 R 中的一个难题.所以我给它做了一个包!

I have found this to be a difficult problem in R as well. So I made a package for it!

library("devtools")
install_github("boRingTrees","mgahan")
require(boRingTrees)

当然,您必须正确计算出单位的上限.

Of course, you will have to figure out your units correctly for the upper bound.

如果您有兴趣,这里还有一些文档.https://github.com/mgahan/boRingTrees

Here is some more documentation if you are interested.https://github.com/mgahan/boRingTrees

对于@beginneR 提供的数据df,您可以使用以下代码获得6 小时滚动总和.

For the data df that @beginneR provided, you could use the following code to get a 6 hour rolling sum.

require(data.table)
setDT(df)
df[ , roll := rollingByCalcs(df,dates="DateTime",target="Value",
                    by="Person",stat=sum,lower=0,upper=6*60*60)]

    Person            DateTime Value roll
 1:      A 2014-01-01 01:15:00     5    5
 2:      A 2014-01-01 02:15:00     5   10
 3:      A 2014-01-01 03:00:00     5   15
 4:      A 2014-01-01 04:15:00     5   20
 5:      A 2014-01-01 07:15:00     5   25
 6:      B 2014-01-01 01:15:00    25   25
 7:      B 2014-01-01 03:15:00    25   50
 8:      B 2014-01-01 12:15:00     2    2
 9:      C 2014-01-01 01:00:00    20   20
10:      C 2014-01-01 02:00:00     5   25
11:      C 2014-01-01 06:45:00     1   26
12:      D 2014-01-01 00:00:00     1    1
13:      D 2014-01-01 01:15:00    13   14
14:      D 2014-01-01 07:15:00    15   28

原来的帖子对我来说很不清楚,所以这可能不是他想要的.如果提供了具有所需输出的列,我想我可以提供更多帮助.

The original post is pretty unclear to me, so this might not be exactly what he wanted. If a column with the desired output was presented, I imagine I could be of more help.

这篇关于R:计算不规则时间序列的滚动和,这些时间序列由 id 变量分组,具有基于时间的窗口的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-29 05:24