python - Python difflib的比率，quick_ratio和real_quick_ratio

我一直在使用difflib的SequenceMatcher，

而且我发现ratio函数太慢了。
通过阅读documentation，我发现quick_ratio和real_quick_ratio应该更快(顾名思义)，并且充当上限。

但是，documentation缺少关于它们所做的假设或它们所提供的加速的描述。

我什么时候应该使用任何一个版本，我应该牺牲什么？

最佳答案

看一下

从助手方法_calculate_ratio开始

def _calculate_ratio(matches, length):
    if length:
        return 2.0 * matches / length
    return 1.0

比率
ratio查找匹配项，然后将其除以两个字符串的总长度乘以2:

    return _calculate_ratio(matches, len(self.a) + len(self.b))

速动比率

这实际上是源评论所说的:

    # viewing a and b as multisets, set matches to the cardinality
    # of their intersection; this counts the number of matches
    # without regard to order, so is clearly an upper bound

进而

    return _calculate_ratio(matches, len(self.a) + len(self.b))

real_quick_ratio
real_quick_ratio查找最短的字符串，除以字符串的总长度乘以2:

    la, lb = len(self.a), len(self.b)
    # can't have more matches than the number of elements in the
    # shorter sequence
    return _calculate_ratio(min(la, lb), la + lb)

这才是真正的上限。

结论
real_quick_ratio不检查字符串是否匹配，它仅根据字符串长度计算上限。

现在，我不是算法专家，但是如果您认为ratio太慢而无法完成工作，建议您使用quick_ratio，因为它可以充分解决问题。

注意效率

从文档字符串

    .ratio() is expensive to compute if you haven't already computed
    .get_matching_blocks() or .get_opcodes(), in which case you may
    want to try .quick_ratio() or .real_quick_ratio() first to get an
    upper bound.

关于python - Python difflib的比率，quick_ratio和real_quick_ratio，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/50487058/