问题描述
我正在寻找可以让我确定两个名字是同一个人的 gem 或项目.例如
I'm looking for a gem or project that would let me identify that two names are the same person. For example
J.R.史密斯 == 约翰 R. 史密斯 == 约翰史密斯 == 约翰罗伊史密斯 == 约翰尼史密斯
我想你明白了.我知道没有什么是 100% 准确的,但我想得到至少可以处理大多数情况的东西.我知道最后一个可能需要一个昵称数据库.
I think you get the idea. I know nothing is going to be 100% accurate but I'd like to get something that at least handles the majority of cases. I know that last one is probably going to need a database of nicknames.
推荐答案
我认为一种选择是使用 莱文斯坦距离
I think one option would be to use a ruby implementation of the Levenshtein distance
两个字符串之间的 Levenshtein 距离定义为将一个字符串转换为另一个字符串所需的最少编辑次数,允许的编辑操作是插入、删除或替换单个字符.
The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character.
然后你可以定义距离小于 X(X 是一个你必须调整的数字)的名字来自同一个人.
Then you could define that names with a distance less than X (being X a number you will have to tweak) are from the same person.
编辑通过一点搜索,我找到了另一种基于语音学的算法 Metaphone
EDITThrough a little search I was able to find another algorithm, based on phonetics called Metaphone
仍然有很多漏洞,但我认为在这种情况下,每个人能做的最好的事情就是为您提供替代方案供您测试并查看最有效的方法
Still has a lot of holes in it, but I think that in this case the best everyone can do is to give you alternatives for you to test and see what works best
这篇关于解析人名并在 Ruby 中匹配它们的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!