问题描述
这个问题困扰了我一整天.
this problem has me stumped for the whole day.
我有两个日语字符串要在 Python2.7 中进行模糊匹配.目前我正在使用fuzzywuzzy和
I have two Japanese strings that I want to fuzzy match in Python2.7. Currently I'm using fuzzywuzzy and
jpnStr = "日本語".encode('utf-8')
jpnList = ["日本語1".encode('utf-8'),"日本語2".encode('utf-8'),"日本語3".encode('utf-8')]
bestmatch = process.extractOne(jpnStr, jpnList)
但最终的最佳匹配总是
("日本語1",0)
我将如何解决这个问题,或者是否有我在这里完全遗漏的最佳实践?对不起,如果我听起来很沮丧,这已经有一段时间了.提前致谢.
How would I go by resolving this issue, or is there a best practice that I'm totally missing here? Sorry if I sound frustrated, it's been a roadblock for a while. Thanks in advance.
推荐答案
好吧,我不确定这有多大帮助,但我找到了解决方法.
Ok, I'm not sure how helpful this is but I've found a workaround.
我发现我可以使用 Fuzzywuzzy 模糊匹配日语字符串.
I found that I could fuzzymatch japanese strings using fuzzywuzzy.
- 首先,您会得到 Unicoded 日语字符串,即日本语です"
- 然后将其作为 ascii 文本输出到文本文件中.输出将类似于/uf34/ufeac/uewa3/..."等等.
- 然后您读取文本文件并将日语字符串的 ascii 表示形式:/uf34/ufeac/uewa3/"相互比较.这给出了一个可行的模糊模糊匹配评级.
这可能不是一种理想的方法,但它有效并且相当准确.希望这对某人有所帮助.
It's probably not an ideal method, but it works and is fairly accurate. Hope this helps somebody.
这篇关于python中的模糊匹配日语字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!