我正在开发一个基于jaccard距离的程序,我需要计算两个二进制位向量之间的jaccard距离。我在网上看到了以下内容:

 If p1 = 10111 and p2 = 10011,

 The total number of each combination attributes for p1 and p2:

 M11 = total number of attributes where p1 & p2 have a value 1,
 M01 = total number of attributes where p1 has a value 0 & p2 has a value 1,
 M10 = total number of attributes where p1 has a value 1 & p2 has a value 0,
 M00 = total number of attributes where p1 & p2 have a value 0.
 Jaccard similarity coefficient = J =
 intersection/union = M11/(M01 + M10 + M11)
 = 3 / (0 + 1 + 3) = 3/4,

 Jaccard distance = J' = 1 - J = 1 - 3/4 = 1/4,
 Or J' = 1 - (M11/(M01 + M10 + M11)) = (M01 + M10)/(M01 + M10 + M11)
 = (0 + 1)/(0 + 1 + 3) = 1/4

现在,在计算系数时,为什么分母中没有“m00”?有人能解释一下吗?

最佳答案

A和B的提花指数为| A∩B |/| A∪B |=| A∩B |/(| A |+| B |-| A∩B |)。
我们有:| A∩B |=M11,| A |=M11+M10,| B |=M11+M01。
所以a b/(a+b-a b))=m11/(m11+m10+m11+m01-m11)=m11/(m10+m01+m11)。
此维恩图可能有助于:
algorithm - 在计算二进制数字之间的jaccard距离时,为什么不包括0个匹配项?-LMLPHP

关于algorithm - 在计算二进制数字之间的jaccard距离时,为什么不包括0个匹配项?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/43518507/

10-11 15:20