如何避免在数据表中进行向量搜索

如何避免在数据表中进行向量搜索

本文介绍了如何避免在数据表中进行向量搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个data.table X,我想创建一个基于2个字符变量的变量

  X [ varC:=((VarA ==A&!is.na(VarA))
|(VarA ==AB& VarB ==B&!is.na &!is.na(VarB))

]

代码工作,但它是非常慢,因为它做矢量扫描2个字符变量。注意,我没有通过VarA和VarB设置键claim4表。在data.table中有一个正确的方法来做到这一点?



更新1:我不使用setkey这个转换,因为我已经使用setkey年,ID)用于其他变量转换。



更新2:我用Matthew的方法对我的方法进行了基准测试,他的速度更快了:

 测试复制已过相对user.self sys.self user.child sys.child 
2 Matthew 100 3.377 1.000 2.596 0.605 0 0
1 vectorSearch 100 200.437 59.354 76.628 40.260 0 0

是setkey,然后re-setkey再次有点冗长:)

解决方案

如何:

  setkey(X,VarA,VarB)
X [,varC:= FALSE]
X [A,varC:= TRUE]
X [J(A,AB),varC:= TRUE]

或者在一行中(保存变量 X 的重复并演示):

  X [,varC:= FALSE] [A,varC:= TRUE] [J(A,AB),varC:= TRUE] 



为了避免按照要求设置密钥,如何操作手动辅助密钥

  S = setkey(X [,list(VarA,VarB,i = seq_len(.N))],VarA,VarB)
X [ varC:= FALSE]
X [S [A,i] [[2]],varC:= TRUE]
X [ ] [[3]],varC:= TRUE]

现在很清楚,因此,是将其构建到语法中;例如

  set2key(X,varA,varB)
X [...某种方式指定哪个键join to ...,varC:= TRUE]

在此期间,如上所示。


I have a data.table X that I would like to create a variable based on 2 character variables

   X[, varC :=((VarA =="A" & !is.na(VarA))
               | (VarA == "AB" & VarB =="B" & !is.na(VarA) & !is.na(VarB))
                )
      ]

This code works but it is very slow, because it does vector scan on 2 char variables. Note that I don't setkey claims4 table by VarA and VarB. Is there a "right" way to do this in data.table?

Update 1: I don't use setkey for this transformation because I already use setkey(X, Year, ID) for other variable transformations. If I do, I need to reset keys back to Year, ID after this transformation.

Update 2: I did benchmark my approach with Matthew's approach, and his is much faster:

          test replications elapsed relative user.self sys.self user.child sys.child
2 Matthew               100   3.377    1.000     2.596    0.605          0         0
1 vectorSearch          100 200.437   59.354    76.628   40.260          0         0

The only minor thing is setkey then re-setkey again is somewhat verbose :)

解决方案

How about :

setkey(X,VarA,VarB)
X[,varC:=FALSE]
X["A",varC:=TRUE]
X[J("A","AB"),varC:=TRUE]

or, in one line (to save repetitions of the variable X and to demonstrate) :

X[,varC:=FALSE]["A",varC:=TRUE][J("A","AB"),varC:=TRUE]

To avoid setting the key, as requested, how about a manual secondary key :

S = setkey(X[,list(VarA,VarB,i=seq_len(.N))],VarA,VarB)
X[,varC:=FALSE]
X[S["A",i][[2]],varC:=TRUE]
X[S[J("A","AB"),i][[3]],varC:=TRUE]

Now clearly, that syntax is ugly. So FR#1007 Build in secondary keys is to build that into the syntax; e.g.,

set2key(X,varA,varB)
X[...some way to specify which key to join to..., varC:=TRUE]

In the meantime it's possible, just manually, as shown above.

这篇关于如何避免在数据表中进行向量搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-30 03:09