我想计算 data.table 中时间序列组之间的互相关。我有这种格式的时间序列数据:
data = data.table( group = c(rep("a", 5),rep("b",5),rep("c",5)) , Y = rnorm(15) )
group Y
1: a 0.90855520
2: a -0.12463737
3: a -0.45754652
4: a 0.65789709
5: a 1.27632196
6: b 0.98483700
7: b -0.44282527
8: b -0.93169070
9: b -0.21878359
10: b -0.46713392
11: c -0.02199363
12: c -0.67125826
13: c 0.29263953
14: c -0.65064603
15: c -1.41143837
每个组都有相同数量的观察。我正在寻找的是一种获得组之间互相关的方法:
group.1 group.2 correlation
a b 0.xxx
a c 0.xxx
b c 0.xxx
我正在编写一个脚本来对每个组进行子集化并附加互相关,但数据大小相当大。有没有有效/禅宗的方法来做到这一点?
最佳答案
这有帮助吗?
data[,id:=rep(1:5,3)]
dtw = dcast.data.table(data, id ~ group, value.var="Y" )[, id := NULL]
cor(dtw)
见 Correlation between groups in R data.table
另一种方法是:
# data
set.seed(45L)
data = data.table( group = c(rep("a", 5),rep("b",5),rep("c",5)) , Y = rnorm(15) )
# method 2
setkey(data, "group")
data2 = data[J(c("b", "c", "a"))][, list(group2=group, Y2=Y)]
data[, c(names(data2)) := data2]
data[, cor(Y, Y2), by=list(group, group2)]
# group group2 V1
# 1: a b -0.2997090
# 2: b c 0.6427463
# 3: c a -0.6922734
并将这种“其他”方式推广到三个以上的群体......
data = data.table( group = c(rep("a", 5),rep("b",5),rep("c",5),rep("d",5)) ,
Y = rnorm(20) )
setkey(data, "group")
groups = unique(data$group)
ngroups = length(groups)
library(gtools)
pairs = combinations(ngroups,2,groups)
d1 = data[pairs[,1],,allow.cartesian=TRUE]
d2 = data[pairs[,2],,allow.cartesian=TRUE]
d1[,c("group2","Y2"):=d2]
d1[,cor(Y,Y2), by=list(group,group2)]
# group group2 V1
# 1: a b 0.10742799
# 2: a c 0.52823511
# 3: a d 0.04424170
# 4: b c 0.65407400
# 5: b d 0.32777779
# 6: c d -0.02425053
关于r - 与一个 data.table 中的多个组互相关,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/23001764/