r - 与一个 data.table 中的多个组互相关

我想计算 data.table 中时间序列组之间的互相关。我有这种格式的时间序列数据:

data = data.table( group = c(rep("a", 5),rep("b",5),rep("c",5)) , Y = rnorm(15) )

   group           Y
 1:    a  0.90855520
 2:    a -0.12463737
 3:    a -0.45754652
 4:    a  0.65789709
 5:    a  1.27632196
 6:    b  0.98483700
 7:    b -0.44282527
 8:    b -0.93169070
 9:    b -0.21878359
10:    b -0.46713392
11:    c -0.02199363
12:    c -0.67125826
13:    c  0.29263953
14:    c -0.65064603
15:    c -1.41143837

每个组都有相同数量的观察。我正在寻找的是一种获得组之间互相关的方法:

group.1   group.2    correlation
      a         b          0.xxx
      a         c          0.xxx
      b         c          0.xxx

我正在编写一个脚本来对每个组进行子集化并附加互相关，但数据大小相当大。有没有有效/禅宗的方法来做到这一点？

最佳答案

这有帮助吗？

data[,id:=rep(1:5,3)]
dtw  = dcast.data.table(data, id ~ group, value.var="Y" )[, id := NULL]
cor(dtw)

见 Correlation between groups in R data.table

另一种方法是:

# data
set.seed(45L)
data = data.table( group = c(rep("a", 5),rep("b",5),rep("c",5)) , Y = rnorm(15) )

# method 2
setkey(data, "group")
data2 = data[J(c("b", "c", "a"))][, list(group2=group, Y2=Y)]
data[, c(names(data2)) := data2]

data[, cor(Y, Y2), by=list(group, group2)]

#     group group2         V1
# 1:      a      b -0.2997090
# 2:      b      c  0.6427463
# 3:      c      a -0.6922734

并将这种“其他”方式推广到三个以上的群体......

data = data.table( group = c(rep("a", 5),rep("b",5),rep("c",5),rep("d",5)) ,
                   Y = rnorm(20) )
setkey(data, "group")

groups = unique(data$group)
ngroups = length(groups)
library(gtools)
pairs = combinations(ngroups,2,groups)

d1 = data[pairs[,1],,allow.cartesian=TRUE]
d2 = data[pairs[,2],,allow.cartesian=TRUE]
d1[,c("group2","Y2"):=d2]
d1[,cor(Y,Y2), by=list(group,group2)]
#    group group2          V1
# 1:     a      b  0.10742799
# 2:     a      c  0.52823511
# 3:     a      d  0.04424170
# 4:     b      c  0.65407400
# 5:     b      d  0.32777779
# 6:     c      d -0.02425053

关于r - 与一个 data.table 中的多个组互相关，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/23001764/