本文介绍了仅在使用data.table的:=连接两个表时需要第一实例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个policyData,这是我非常庞大的数据集(数百万行),我希望通过映射表(数万行)向其中添加一些信息.

I have a policyData which is my very huge data set (millions of rows) and I wish to add some info to it with a mapping table (tens of thousands of rows).

示例:

policyData <- data.table(plan=c("c","b","b","d"),v=c(8,7,5,6),foo=c(4,2,8,3))
mapping <- data.table(plan=c("b","b","a","a","c","c"),a=c(1,2,4,5,7,8),b=c(9,8,6,5,3,2))

policyData:

policyData:

   plan v foo
1:    c 8   4
2:    b 7   2
3:    b 5   8
4:    d 6   3

映射:

   plan a b
1:    b 1 9
2:    b 2 8
3:    a 4 6
4:    a 5 5
5:    c 7 3
6:    c 8 2

问题是该映射具有多个实例,我希望仅获得第一个匹配项.而且我需要使用:=使用内存高效的方式将两者结合起来.

Problem is that the mapping has multiple instances and I wish to get the first match only. And I need to join the two using memory efficient way using :=.

所需的输出是:

   plan v foo  a  b
1:    c 8   4  7  3
2:    b 7   2  1  9
3:    b 5   8  1  9
4:    d 6   3 NA NA

我尝试过:

policyData[mapping, on="plan", `:=`(a=i.a, b=i.b)] 

给出映射表中的最后一个实例:

which gives the last instance in the mapping table:

   plan v foo  a  b
1:    c 8   4  8  2
2:    b 7   2  2  8
3:    b 5   8  2  8
4:    d 6   3 NA NA

我也尝试过:

policyData[mapping, on="plan", `:=`(a=i.a, b=i.b), mult="first"]

给出奇怪的结果(第二个"b"与映射不匹配):

which gives strange result (the second "b" can't match with the mapping):

   plan v foo  a  b
1:    c 8   4  8  2
2:    b 7   2  2  8
3:    b 5   8 NA NA
4:    d 6   3 NA NA

任何见解都会有所帮助.我已经做了很多搜索.

Any insight would be helpful. I've done lots of searches already.

推荐答案

只需将mappingmapping[, .SD[1], by = plan]进行汇总,然后将其用于联接:

Just summarise mapping with mapping[, .SD[1], by = plan] and use that for joining:

policyData[mapping[, .SD[1], by = plan]
           , on = .(plan)
           , `:=` (a = i.a, b = i.b)]

给出所需的输出:

> policyData
   plan v foo  a  b
1:    c 8   4  7  3
2:    b 7   2  1  9
3:    b 5   8  1  9
4:    d 6   3 NA NA

这篇关于仅在使用data.table的:=连接两个表时需要第一实例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-24 11:12