问题描述
我在R中使用'agrep'函数,它返回一个匹配的向量。我想要一个类似于agrep的函数,只返回最佳匹配,或者最好的匹配,如果有tie。目前,我使用'cba'包中的'sdist()'函数在结果向量的每个元素上执行此操作,但这看起来非常多余。
I'm using the 'agrep' function in R, which returns a vector of matches. I would like a function similar to agrep that only returns the best match, or best matches if there are ties. Currently, I am doing this using the 'sdist()' function from the package 'cba' on each element of the resulting vector, but this seems very redundant.
编辑:这里是我目前使用的功能。
/edit: here is the function I'm currently using. I'd like to speed it up, as it seems redundant to calculate distance twice.
library(cba)
word <- 'test'
words <- c('Teest','teeeest','New York City','yeast','text','Test')
ClosestMatch <- function(string,StringVector) {
matches <- agrep(string,StringVector,value=TRUE)
distance <- sdists(string,matches,method = "ow",weight = c(1, 0, 2))
matches <- data.frame(matches,as.numeric(distance))
matches <- subset(matches,distance==min(distance))
as.character(matches$matches)
}
ClosestMatch(word,words)
推荐答案
RecordLinkage包已从CRAN中删除,请改用stringdist:
RecordLinkage package was removed from CRAN, use stringdist instead:
library(stringdist)
ClosestMatch2 = function(string, stringVector){
stringVector[amatch(string, stringVector, maxDist=Inf)]
}
这篇关于agrep:只返回最匹配的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!