问题描述
有没有一种通用的方法可以在相似度和距离之间转换?
Is there a general way to convert between a measure of similarity and a measure of distance?
考虑类似的度量,例如两个字符串共有的2克数.
Consider a similarity measure like the number of 2-grams that two strings have in common.
2-grams('beta', 'delta') = 1
2-grams('apple', 'dappled') = 4
如果我需要将其提供给期望差异度的优化算法(如Levenshtein距离)怎么办?
What if I need to feed this to an optimization algorithm that expects a measure of difference, like Levenshtein distance?
这只是一个例子...我正在寻找一种通用的解决方案,如果存在的话.像如何从Levenshtein距离达到相似度?
This is just an example...I'm looking for a general solution, if one exists. Like how to go from Levenshtein distance to a measure of similarity?
感谢您提供的任何指导.
I appreciate any guidance you may offer.
推荐答案
让 d 表示距离, s 表示相似性.要将距离量度转换为相似度量度,我们首先需要使用 d_norm = d /max( d ).然后通过以下方式给出相似性度量:
Let d denotes distance, s denotes similarity. To convert distance measure to similarity measure, we need to first normalize d to [0 1], by using d_norm = d/max(d). Then the similarity measure is given by:
s = 1- d_norm .
其中 s 的范围为[0 1],其中1表示相似度最高(比较项相同),0表示相似度最低(距离最大).
where s is in the range [0 1], with 1 denotes highest similarity (the items in comparison are identical), and 0 denotes lowest similarity (largest distance).
这篇关于如何在相似性度量和差异(距离)度量之间转换?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!