问题描述
我想基于联接和固定值更新data.table中行的子集.
I'd like to update a subset of rows in a data.table based on a join and a fixed value.
d1 <- data.table(A = 5:1, B = letters[5:1])
d2 <- data.table(C = letters[5:1], Z = 6:10)
current.val <- 5
我要做的是基于与d2的联接更新d1,但仅在d1中A == 5的情况下进行.像这样的东西:
what i want to do is update d1 based on the join with d2, but only where where A==5 in d1. something like this:
d1[d2, D := i.Z ,on=.(B==C, A==current.val)]
我当前的方法是在d2中添加一个新列,并将其设置为固定值,并在联接中使用它:
my current approach is a add a new column to d2 and set it to the fixed value and use that in the join:
d2[, current.val := 5]
d1[d2, D := i.Z ,on=.(B==C, A==current.val)]
这有效,但似乎有很多开销.有没有更简单的方法在联接中使用常量值?
This works, but seem like a lot of overhead. Is there a simpler way to use a constant value in a join?
(8/14)用于基准测试的新比例示例:
(8/14) New scale example for benchmarking:
d1 <- data.table(A = 100:1, B = 100000000:1, D = as.numeric(NA), key = c("A", "B"))
d2 <- data.table(C = 100000000:1, Z = c(10:1) / 10, key = "C")
current.val <- 5
system.time(d1[cbind(d2, A = current.val), on = .(B = C, A), D := i.Z])
system.time({setkey(d1, B, A); d1[d1[d2][A == current.val], D := Z]; setkey(d1, A, B)})
system.time(d1[d1[d2][A == current.val], D := Z]) # fastest, if inverse key order is acceptable
推荐答案
这是一个好方法.或者,您可以使用cbind
在连接内部临时添加一列:
That's a good way to go. Alternately, you could add a column temporarily inside the join with cbind
:
d1[cbind(d2, A = current.val), on=.(B = C, A), D := i.Z ]
实际上,在这里c
可以代替cbind
,但是我觉得这很奇怪.
Actually, c
works in place of cbind
here, but I find it a weirder approach.
这篇关于使用常量对data.table更新进行联接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!