更好的模糊匹配性能?

本文介绍了更好的模糊匹配性能?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前正在使用 difflib 中的方法get_close_matches方法进行迭代15,000个字符串的列表，以与大约15,000个字符串的另一个列表最接近:

I'm currently using method get_close_matches method from difflib to iterate through a list of 15,000 strings to get the closest match against another list of approx 15,000 strings:

a=['blah','pie','apple'...]
b=['jimbo','zomg','pie'...]

for value in a:
    difflib.get_close_matches(value,b,n=1,cutoff=.85)

每个值花费.58秒，这意味着完成循环将花费8,714秒或145分钟.是否有另一种库/方法可能更快或更有效地提高了此方法的速度?我已经尝试过将两个数组都转换为小写，但这只会导致速度略有提高.

It takes .58 seconds per value which means it will take 8,714 seconds or 145 minutes to finish the loop. Is there another library/method that might be faster or a way to improve the speed for this method? I've already tried converting both arrays to lower case, but it only resulted in a slight speed increase.

145

更好的模糊匹配性能?

问题描述

推荐答案