问题描述
我知道 SOUNDEX 和(双)Metaphone,但这些不能让我测试整个单词的相似性 - 例如嗨"听起来与再见"非常相似,但这两种方法会将它们标记为完全不同.
I'm aware of SOUNDEX and (double) Metaphone, but these don't let me test for the similarity of words as a whole - for example "Hi" sounds very similar to "Bye", but both of these methods will mark them as completely different.
Ruby 中是否有任何库或您知道的任何方法能够确定两个单词之间的相似性?(要么是布尔值是/不相似,要么是数字 40% 相似)
Are there any libraries in Ruby, or any methods you know of, that are capable of determining the similarity between two words? (Either a boolean is/isn't similar, or numerical 40% similar)
如果有一种简单的方法可以插入"不同的方言或语言,则额外加分!
edit: Extra bonus points if there is an easy method to 'drop in' a different dialect or language!
推荐答案
我认为你在描述 levenshtein 距离.是的,有宝石可以做到这一点.如果您喜欢纯 Ruby,请选择 text gem.
I think you're describing levenshtein distance. And yes, there are gems for that. If you're into pure Ruby go for the text gem.
$ gem install text
文档 有更多详细信息,但关键在于:
The docs have more details, but here's the crux of it:
Text::Levenshtein.distance('test', 'test') # => 0
Text::Levenshtein.distance('test', 'tent') # => 1
如果您对原生扩展没问题...
If you're ok with native extensions...
$ gem install levenshtein
用法类似.它的性能非常好.(它在我的系统上每分钟处理约 1000 次拼写更正.)
It's usage is similar. It's performance is very good. (It handles ~1000 spelling corrections per minute on my systems.)
如果您需要知道两个单词的相似程度,请使用距离而不是单词长度.
If you need to know how similar two words are, use distance over word length.
如果您想要一个简单的相似性测试,请考虑以下内容:
If you want a simple similarity test, consider something like this:
未经测试,但很直接:
String.module_eval do
def similar?(other, threshold=2)
distance = Text::Levenshtein.distance(self, other)
distance <= threshold
end
end
这篇关于在 Ruby 中检测发音相似的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!