问题描述
在第1天到第26天的时间序列中,我有两个矩阵用于控制和治疗的标准化读取计数.我想通过动态时间包装计算距离矩阵,然后将其用于聚类,但似乎太复杂了.我是这样做的;谁可以帮助您进一步澄清?非常感谢
I have two matrices of normalized read counts for control and treatment in a time series day1 to day26. I want to calculate distance matrix by Dynamic Time Wrapping afterward use that for clustering but seems too complicated. I did so; who can help for more clarification please? Thanks a lot
> head(control[,1:4])
MAST2 WWC2 PHYHIPL R3HDM2
Control_D1 6.591024 5.695156 3.388652 5.756384
Control_D1 8.043454 5.365221 6.859768 6.936970
Control_D3 7.731590 4.868267 6.919972 6.931073
Control_D4 8.129948 5.105528 6.627016 7.090268
Control_D5 7.690863 4.729501 6.824746 6.904610
Control_D6 8.101723 5.334501 6.868990 7.115883
>
> head(lead[,1:4])
MAST2 WWC2 PHYHIPL R3HDM2
Lead30_D1 6.418423 5.610699 3.734425 5.778046
Lead30_D2 7.918360 4.295191 6.559294 6.780952
Lead30_D3 7.807142 4.294722 6.599187 6.716040
Lead30_D4 7.856720 4.432136 6.572337 6.848483
Lead30_D5 7.827311 4.204738 6.607107 6.784094
Lead30_D6 7.848760 4.458451 6.581216 6.943003
>
> dim(control)
[1] 26 2603
> dim(lead)
[1] 26 2603
library(dtw)
for (i in control) {
for (j in lead) {
result[i,j] <- dtw( dist(control[,,i],lead[,,j]), distance.only=T )$normalizedDistance
}
}
说
Error in lead[, , j] : incorrect number of dimensions
推荐答案
已经有与您类似的问题,但是答案还不太详细.这是您需要了解的细目,在R的特定情况下.
There have already been questions similar to yours,but the answers haven't been too detailed.Here's a breakdown of what you need to know,in the specific case of R.
proxy
软件包专门用于计算交叉距离矩阵.您应该检查其插图,以了解它已经实施了哪些措施.使用示例:
The proxy
package is made specifically for the calculation of cross-distance matrices.You should check its vignette to know which measures are already implemented by it.An example of its use:
set.seed(1L)
sample_data <- matrix(rnorm(50L), nrow = 5L, ncol = 10L)
suppressPackageStartupMessages(library(proxy))
distance_matrix <- proxy::dist(sample_data, method = "euclidean",
upper = TRUE, diag = TRUE)
print(distance_matrix)
#> 1 2 3 4 5
#> 1 0.000000 2.636027 3.834764 5.943374 3.704322
#> 2 2.636027 0.000000 2.587398 4.515470 2.310364
#> 3 3.834764 2.587398 0.000000 4.008678 3.899561
#> 4 5.943374 4.515470 4.008678 0.000000 5.059321
#> 5 3.704322 2.310364 3.899561 5.059321 0.000000
注意:在时间序列中,proxy
将矩阵中的每个行视为一个序列,可以通过上面的sample_data
是5x10
矩阵,而得到的交叉距离矩阵是5x5
来确认.
Note: in the context of time series,proxy
treats each row in a matrix as a series,which can be confirmed by the fact that sample_data
above is a 5x10
matrix and the resulting cross-distance matrix is 5x5
.
dtw
包实现了DTW的许多变体,并且还利用了proxy
.您可以使用以下方法计算DTW距离矩阵:
The dtw
package implements many variations of DTW,and it also leverages proxy
.You could calculate a DTW distance matrix with:
suppressPackageStartupMessages(library(dtw))
dtw_distmat <- proxy::dist(sample_data, method = "dtw",
upper = TRUE, diag = TRUE)
print(distance_matrix)
#> 1 2 3 4 5
#> 1 0.000000 2.636027 3.834764 5.943374 3.704322
#> 2 2.636027 0.000000 2.587398 4.515470 2.310364
#> 3 3.834764 2.587398 0.000000 4.008678 3.899561
#> 4 5.943374 4.515470 4.008678 0.000000 5.059321
#> 5 3.704322 2.310364 3.899561 5.059321 0.000000
使用自定义距离
关于proxy
的一件好事是,它使您可以选择注册自定义功能.您似乎对DTW的规范化版本感兴趣,因此您可以执行以下操作:
Using custom distances
One nice thing about proxy
is that it gives you the option to register custom functions.You seem to be interested in the normalized version of DTW,so you could do something like this:
ndtw <- function(x, y = NULL, ...) {
dtw::dtw(x, y, ..., distance.only = TRUE)$normalizedDistance
}
pr_DB$set_entry(
FUN = ndtw,
names = "ndtw",
loop = TRUE,
distance = TRUE
)
ndtw_distmat <- proxy::dist(sample_data, method = "ndtw",
upper = TRUE, diag = TRUE)
print(ndtw_distmat)
#> 1 2 3 4 5
#> 1 0.0000000 0.4046622 0.5075772 0.6789465 0.5290478
#> 2 0.4046622 0.0000000 0.3630849 0.4866252 0.3612722
#> 3 0.5075772 0.3630849 0.0000000 0.5678698 0.3303344
#> 4 0.6789465 0.4866252 0.5678698 0.0000000 0.5078112
#> 5 0.5290478 0.3612722 0.3303344 0.5078112 0.0000000
有关更多信息,请参见pr_DB
的文档.
See the documentation of pr_DB
for more information.
dtwclust
软件包(我做的)实现了DTW的基本但较快的版本,该版本可以使用多线程并还利用proxy
:
The dtwclust
package(which I made)implements a basic but faster version of DTW which can use multi-threading and also leverages proxy
:
suppressPackageStartupMessages(library(dtwclust))
dtw_basic_distmat <- proxy::dist(sample_data, method = "dtw_basic", normalize = TRUE)
print(dtw_basic_distmat)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0.0000000 0.4046622 0.5075772 0.6789465 0.5290478
#> [2,] 0.4046622 0.0000000 0.3630849 0.4866252 0.3612722
#> [3,] 0.5075772 0.3630849 0.0000000 0.5678698 0.3303344
#> [4,] 0.6789465 0.4866252 0.5678698 0.0000000 0.5078112
#> [5,] 0.5290478 0.3612722 0.3303344 0.5078112 0.0000000
dtw_basic
实现仅支持两种步骤模式和一种窗口类型,但这要快得多:
The dtw_basic
implementation only supports two step patterns and one window type,but it is considerably faster:
suppressPackageStartupMessages(library(microbenchmark))
microbenchmark(
proxy::dist(sample_data, method = "dtw", window.type = "sakoechiba", window.size = 5L),
proxy::dist(sample_data, method = "dtw_basic", window.size = 5L)
)
Unit: microseconds
expr min lq mean
proxy::dist(sample_data, method = "dtw", window.type = "sakoechiba", window.size = 5L) 5279.124 5621.742 6070.069
proxy::dist(sample_data, method = "dtw_basic", window.size = 5L) 657.966 710.418 776.474
median uq max neval cld
5802.354 6348.199 10411.000 100 b
752.282 814.037 1161.626 100 a
parallelDist
包中还包含另一种多线程实现,尽管我还没有亲自测试过.
Another multi-threaded implementation is included in the parallelDist
package,although I haven't personally tested it.
单个多元序列通常是一个矩阵,其中时间跨行,而多个变量跨列.DTW也适用于他们:
A single multivariate series is commonly a matrix where time spans the rows and the multiple variables span the columns.DTW also works for them:
mv_series1 <- matrix(rnorm(15L), nrow = 5L, ncol = 3L)
mv_series2 <- matrix(rnorm(15L), nrow = 5L, ncol = 3L)
print(dtw_distance <- dtw_basic(mv_series1, mv_series2))
#> [1] 22.80421
proxy
的优点是它也可以计算列表中包含的对象之间的距离,因此您可以在矩阵列表中放置几个多元系列:
The nice thing about proxy
is that it can calculate distances between objects contained in lists too,so you can put several multivariate series in lists of matrices:
mv_series <- lapply(1L:5L, function(dummy) {
matrix(rnorm(15L), nrow = 5L, ncol = 3L)
})
mv_distmat_dtwclust <- proxy::dist(mv_series, method = "dtw_basic")
print(mv_distmat_dtwclust)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0.00000 27.43599 32.14207 36.42211 31.19279
#> [2,] 27.43599 0.00000 20.88470 23.88436 29.73219
#> [3,] 32.14207 20.88470 0.00000 22.14376 29.99899
#> [4,] 36.42211 23.88436 22.14376 0.00000 28.81111
#> [5,] 31.19279 29.73219 29.99899 28.81111 0.00000
您的案子
无论您选择什么,您可能可以使用proxy
来获得结果,但是由于您还没有提供全部数据,我不能给你一个更具体的例子.我想dtwclust::dtw_basic(control[, 1:4], lead[, 1:4], normalize = TRUE)
会给您一对系列之间的距离,假设您将每个变量都视为包含4个变量的多变量序列.
Your case
Regardless of what you choose,you can probably use proxy
to get your result,but since you haven't provided your whole data,I can't give you a more specific example.I presume that dtwclust::dtw_basic(control[, 1:4], lead[, 1:4], normalize = TRUE)
would give you the distance between one pair of series,assuming you're treating each one as a multivariate series with 4 variables.
这篇关于用dtw计算距离矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!