I am a n00b at R and a n00b at stack overflow (just joined), so forgive me if I have failed to use markup (which I don't know) or missed something in the readme.
If you don't mind, I will go through my full problem here as perhaps you might be kind enough to shed some insight into how I should best go about this!
为每个TS1构建单独的时间序列对象请在下面找到一个数据示例。基本上,我正在加载一个带有多个不规则时间序列的csv文件(例如TS1,TS2),所以在理想的世界中,我会将它们分成单独的,不规则的时间序列对象(例如动物园?),所以TS1,TS2 ......这里讨论了这个问题()但我已多次尝试使用此方法,但失败了。
Stage 1
Construction of individual time-series objects for each TS1 Please find a data example below. Essentially, I am loading a csv file with multiple, irregular time-series in it (example TS1, TS2) below, so in an ideal world, I would split these into individual, irregular time-series objects (e.g. zoo?), so TS1, TS2, ... this problem was discussed here (R/zoo: handle non-unique index entries but not lose data?) but I have tried repeatedly to use this approach, and failed.
Date TS Data
21/05/2014 TS1 0.95
17/04/2014 TS1 1.02
27/03/2014 TS1 0.90
30/01/2014 TS1 0.80
12/12/2013 TS1 0.70
18/09/2013 TS1 0.67
01/11/2012 TS1 0.71
01/11/2012 TS1 0.70
21/05/2014 TS2 0.47
20/05/2014 TS2 0.51
16/05/2014 TS2 0.49
15/05/2014 TS2 0.55
10/05/2014 TS2 0.63
07/05/2014 TS2 0.77
可以看出,th问题是由于TS1的重复日期索引 01/11/2012
而导致 read.zoo
as can be seen, the problem arises due to duplicate date index of 01/11/2012
for TS1 which causes read.zoo
not to create my split data object.
我想要做的是,在每个不定期的日期,添加所有数据约会。由于所有时间序列都是不规则的,并且具有不同的规律性,因此我想使用 TS
的先前值。例如。对于 21/05/2014
,这个例子中的计算很简单,因为TS1和2都有一个条目,所以答案是 0.47 + 0.95
。但是对于 20/05
,只有 TS2
有一个条目,所以 TS1的值
应该使用的是截至该日期的最新值,即 17/04/2014
值 1.02
,因此 20/05/2014
的计算应为 0.51 + 1.02
Stage 2
What I would then like to do is, on every irregular date, add all the data as of that date together. Since all the time-series are irregular, and with different regularity, I would like to use the prior value for a TS
. E.g. for 21/05/2014
, this calculation in the example is straightforward as both TS1 and 2 have an entry, so the answer would be 0.47 + 0.95
. But for 20/05
, only TS2
has an entry, so the value for TS1
that should be used is the most recent one as of that date, i.e. the 17/04/2014
value of 1.02
, so the calculation for 20/05/2014
should be 0.51 + 1.02
. It could be that the simplest way of achieving this might be to convert each TS into a daily value, such that the previous value is used until a new data point? but this is wasteful/unnecessary for stage 3 below.
Stage 3
Having created this aggregated data sum of all the TS', I want to do a polynomial curve-fit. I also want to differentiate this curve-fit to find the rate-of-change as of today's date predicated by this fitted curve.
Any help would be much appreciated! I feel that repeatedly hitting my head against a wall would be more fun than doing anything more at this stage!!
Updated: I now have code as follows thanks to Grothendieck.
f <- function (z) {
zz <- read.zoo(z, header = TRUE, split = 2, format = "%d/%m/%Y", aggregate = mean);
z.fill <- na.locf(zz);
z.fill <- (z.fill >= 0.5) * z.fill;
z.fill <- na.fill(z.fill,0);
zfill.mat = matrix(z.fill, NROW(z.fill));
z.sum <- rowSums(zfill.mat);
zsum <- zoo(z.sum,time(z.fill));
DF <- read.csv(file.choose(), header = TRUE, as.is = TRUE);
DF.S <- split(DF[-2], DF[[2]]);
user <- DF[1,2];
Ret <- lapply(DF.S, f);
Ret包含一个数据框列表。我可以通过键入Ret $ user来访问它,但由于用户不同,我需要使其动态化。我试图构建一个动态表达式,例如:
x< - paste(Ret $',user,',sep =);
plot(x )
I a remaining problem:
Ret contains a list of a data frame. I can access this by typing Ret$user, but since user varies, I need to make this dynamic. I have tried to construct a dynamic expression e.g.:
x <- paste("Ret$'",user,"'",sep = "");
but could not get this to evaluate.
,但您可以使用 sum
或任何其他函数。 (如果数据来自文件,我们将用 read.zoo
中的 text = Lines
。)然后我们使用 na.locf
来填写NAs,对行进行求和,我们使用 na.omit
删除任何提供 zsum
的领先NAs。接下来,我们计算一个规则间隔的时间网格 g
和一个样条函数 splfun
评估该函数及其在网格上的衍生物转换回动物园后,给 zspl
和 zder
has an aggregate=
argument which takes a function that is used to aggregate the values at duplicate times in the same series. Here we take the mean
of duplicate days within series but you could use sum
or any other function. (If the data were coming from a file we would replace text = Lines
argument in read.zoo
with something like "myfile.dat"
.) Then we use na.locf
to fill in the NAs, sum the rows and we use na.omit
to drop any leading NAs giving zsum
. Next we compute a regularly spaced time grid g
and a spline function splfun
evaluating that function and its derivative on the grid which, after converting back to zoo, give zspl
and zder
. Finally we plot them.
Lines <- "Date TS Data
21/05/2014 TS1 0.95
17/04/2014 TS1 1.02
27/03/2014 TS1 0.90
30/01/2014 TS1 0.80
12/12/2013 TS1 0.70
18/09/2013 TS1 0.67
01/11/2012 TS1 0.71
01/11/2012 TS1 0.70
21/05/2014 TS2 0.47
20/05/2014 TS2 0.51
16/05/2014 TS2 0.49
15/05/2014 TS2 0.55
10/05/2014 TS2 0.63
07/05/2014 TS2 0.77"
z <- read.zoo(text = Lines, header = TRUE, split = 2, format = "%d/%m/%Y",
aggregate = mean)
zsum <- na.omit(zoo(rowSums(na.locf(z)), time(z)))
g <- seq(start(zsum), end(zsum), "day")
splfun <- splinefun(time(zsum), coredata(zsum))
zspl <- zoo(splfun(g), g)
zder <- zoo(splfun(g, deriv = 1), g)
plot(merge(zspl, zder))