问题描述
我有一个policyData,这是我非常庞大的数据集(数百万行),我希望通过映射表(数万行)向其中添加一些信息.
I have a policyData which is my very huge data set (millions of rows) and I wish to add some info to it with a mapping table (tens of thousands of rows).
示例:
policyData <- data.table(plan=c("c","b","b","d"),v=c(8,7,5,6),foo=c(4,2,8,3))
mapping <- data.table(plan=c("b","b","a","a","c","c"),a=c(1,2,4,5,7,8),b=c(9,8,6,5,3,2))
policyData:
policyData:
plan v foo
1: c 8 4
2: b 7 2
3: b 5 8
4: d 6 3
映射:
plan a b
1: b 1 9
2: b 2 8
3: a 4 6
4: a 5 5
5: c 7 3
6: c 8 2
问题是该映射具有多个实例,我希望仅获得第一个匹配项.而且我需要使用:=
使用内存高效的方式将两者结合起来.
Problem is that the mapping has multiple instances and I wish to get the first match only. And I need to join the two using memory efficient way using :=
.
所需的输出是:
plan v foo a b
1: c 8 4 7 3
2: b 7 2 1 9
3: b 5 8 1 9
4: d 6 3 NA NA
我尝试过:
policyData[mapping, on="plan", `:=`(a=i.a, b=i.b)]
给出映射表中的最后一个实例:
which gives the last instance in the mapping table:
plan v foo a b
1: c 8 4 8 2
2: b 7 2 2 8
3: b 5 8 2 8
4: d 6 3 NA NA
我也尝试过:
policyData[mapping, on="plan", `:=`(a=i.a, b=i.b), mult="first"]
给出奇怪的结果(第二个"b"与映射不匹配):
which gives strange result (the second "b" can't match with the mapping):
plan v foo a b
1: c 8 4 8 2
2: b 7 2 2 8
3: b 5 8 NA NA
4: d 6 3 NA NA
任何见解都会有所帮助.我已经做了很多搜索.
Any insight would be helpful. I've done lots of searches already.
推荐答案
只需将mapping
与mapping[, .SD[1], by = plan]
进行汇总,然后将其用于联接:
Just summarise mapping
with mapping[, .SD[1], by = plan]
and use that for joining:
policyData[mapping[, .SD[1], by = plan]
, on = .(plan)
, `:=` (a = i.a, b = i.b)]
给出所需的输出:
> policyData
plan v foo a b
1: c 8 4 7 3
2: b 7 2 1 9
3: b 5 8 1 9
4: d 6 3 NA NA
这篇关于仅在使用data.table的:=连接两个表时需要第一实例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!