问题描述
我正在开发一个基于Jaccard距离的程序,我需要计算两个二进制位向量之间的Jaccard距离.我在网上遇到了以下情况:
I am working on a program based on Jaccard Distance, and I need to calculate the Jaccard Distance between two binary bit vectors. I came across the following on the net:
If p1 = 10111 and p2 = 10011,
The total number of each combination attributes for p1 and p2:
M11 = total number of attributes where p1 & p2 have a value 1,
M01 = total number of attributes where p1 has a value 0 & p2 has a value 1,
M10 = total number of attributes where p1 has a value 1 & p2 has a value 0,
M00 = total number of attributes where p1 & p2 have a value 0.
Jaccard similarity coefficient = J =
intersection/union = M11/(M01 + M10 + M11)
= 3 / (0 + 1 + 3) = 3/4,
Jaccard distance = J' = 1 - J = 1 - 3/4 = 1/4,
Or J' = 1 - (M11/(M01 + M10 + M11)) = (M01 + M10)/(M01 + M10 + M11)
= (0 + 1)/(0 + 1 + 3) = 1/4
现在,在计算系数时,为什么分母中不包含"M00"?谁能解释一下?
Now, while calculating the coefficient, why was "M00" not included in the denominator? Can anyone please explain?
推荐答案
Jaccard系数是不对称二进制属性的度量,例如,存在某项比缺少某项更重要的情况.
Jaccard coefficient is a measure of asymmetric binary attributes,f.e., a scenario where the presence of an item is more important than its absence.
由于M00仅处理缺席情况,因此在计算Jaccard系数时不考虑它.
Since M00 deals only with absence, we do not consider it while calculating Jaccard coeffecient.
例如,在检查是否存在某种疾病时,该疾病的存在才是更重要的结果.
For example, while checking for the presence/absence of a disease, the presence of the disease is the more significant outcome.
希望有帮助!
这篇关于在计算二进制数字之间的jaccard距离时,为什么不包括0个匹配项?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!