问题描述
我使用的语言是R,但你并不一定需要知道关于R来回答这个问题。
The language I'm using is R, but you don't necessarily need to know about R to answer the question.
问:我有个序列可被认为是基础事实,和另一序列是第一的一个移位版本,与某些缺失值。我想知道如何使两者。
Question:I have a sequence that can be considered the ground truth, and another sequence that is a shifted version of the first, with some missing values. I'd like to know how to align the two.
设置
我有一个序列 ground.truth
,基本上是一组时间:
I have a sequence ground.truth
that is basically a set of times:
ground.truth <- rep( seq(1,by=4,length.out=10), 5 ) +
rep( seq(0,length.out=5,by=4*10+30), each=10 )
想想 ground.truth
随着时代在那里我做了以下内容:
Think of ground.truth
as times where I'm doing the following:
{take a sample every 4 seconds for 10 times, then wait 30 seconds} x 5
我有第二个序列的意见
,其中 ground.truth
的移动的带缺少的值的20%:
I have a second sequence observations
, which is ground.truth
shifted with 20% of the values missing:
nSamples <- length(ground.truth)
idx_to_keep <- sort(sample( 1:nSamples, .8*nSamples ))
theLag <- runif(1)*100
observations <- ground.truth[idx_to_keep] + theLag
nObs <- length(observations)
如果我绘制这些向量,这是什么样子的(记住,想到这些随着时代):
If I plot these vectors this is what it looks like (remember, think of these as times):
我已经试过。我想
- 在计算(上述在我的例子
theLag
)的转变 - 在计算一个vector
IDX
,使得ground.truth [IDX] ==意见 - theLag
- calculate the shift (
theLag
in my example above) - calculate a vector
idx
such thatground.truth[idx] == observations - theLag
首先,假设我们知道 theLag
。需要注意的是 ground.truth [1]
不一定的意见[1] -theLag
。事实上,我们有 ground.truth [1] ==意见[1 + LAGI] -theLag
一些 LAGI
First, assume we know theLag
. Note that ground.truth[1]
is not necessarily observations[1]-theLag
. In fact, we have ground.truth[1] == observations[1+lagI]-theLag
for some lagI
.
要计算这个,我想我会用交叉相关( CCF
函数)。
To calculate this, I thought I'd use cross-correlation (ccf
function).
但是,每当我这样做,我得到了最大滞后。互相关的0,这意味着 ground.truth [1] ==意见[1] - theLag
。但我在例子,我已经明确地尝试这样的确信的是的意见[1] - theLag
是不可以 ground.truth [1]
(即修改 idx_to_keep
,以确保它没有1的话)。
However, whenever I do this I get a lag with a max. cross-correlation of 0, meaning ground.truth[1] == observations[1] - theLag
. But I've tried this in examples where I've explicitly made sure that observations[1] - theLag
is not ground.truth[1]
(i.e. modify idx_to_keep
to make sure it doesn't have 1 in it).
移位 theLag
不应该影响的互相关(不是 CCF(X,Y)== CCF(X,Y -constant)
?),所以我打算以后去解决它。
The shift theLag
shouldn't affect the cross-correlation (isn't ccf(x,y) == ccf(x,y-constant)
?) so I was going to work it out later.
也许我误解,但因为的意见
没有在它尽可能多的值 ground.truth
?即使在简单的情况下,我设置 theLag == 0
,互相关函数仍然不能识别正确的滞后性,这使我相信我在考虑这个错误
Perhaps I'm misunderstanding though, because observations
doesn't have as many values in it as ground.truth
? Even in the simpler case where I set theLag==0
, the cross correlation function still fails to identify the correct lag, which leads me to believe I'm thinking about this wrong.
有没有人有一个通用的方法对我来说,去了解这一点,或者知道一些R里面的函数/包,可以帮助?
多谢了。
推荐答案
有关的滞后性,你可以计算你的两个点集之间的所有差异(距离):
For the lag, you can compute all the differences (distances) between your two sets of points:
diffs <- outer(observations, ground.truth, '-')
您滞后应该出现长度(观察)
倍值:
which(table(diffs) == length(observations))
# 55.715382960625
# 86
仔细检查:
theLag
# [1] 55.71538
你问题的第二部分是容易的,一旦你找到 theLag
:
idx <- which(ground.truth %in% (observations - theLag))
这篇关于对准缺失值序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!