我正在尝试使用FuzzyWuzzy纠正文本中拼写错误的名称。但是我无法使process.extract和process.extractOne表现出我期望的行为。

from fuzzywuzzy import process

the_text = 'VICTOR HUGO e MARIANA VEIGA'
search_term = 'VEYGA'

the_text = the_text.split()
found_word = process.extract(search_term, the_text)

print(found_word)


结果是:

[('e', 90), ('VEIGA', 80), ('HUGO', 22), ('VICTOR', 18), ('MARIANA', 17)]


如何获得FuzzyWuzzy正确识别“ VEIGA”为正确的响应?

最佳答案

您可以尝试使用:fuzz.token_set_ratio或fuzz.token_sort_ratio
答案在这里:When to use which fuzz function to compare 2 strings提供了很好的解释。

对于完成,这里是一些代码:

from fuzzywuzzy import process
from fuzzywuzzy import fuzz

the_text = 'VICTOR HUGO e MARIANA VEIGA'
search_term = 'VEYGA'

the_text = the_text.split()
found_word = process.extract(search_term, the_text, scorer=fuzz.token_sort_ratio)

print(found_word)


输出:

[('VEIGA',80),('e',33),('HUGO',22),('VICTOR',18),('MARIANA',17)]

关于python - FuzzyWuzzy提取中的奇怪行为,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/50468250/

10-09 16:05