问题描述
我想知道是否有某种方法可以在PHP中进行模糊字符串匹配.寻找长字符串中的单词,即使拼写错误,也要找到可能的匹配项;如果由于OCR错误而被一个字符关闭,就会找到它.
I'm wondering if there is some kind of way to do fuzzy string matching in PHP. Looking for a word in a long string, finding a potential match even if its mis-spelled; something that would find it if it was off by one character due to an OCR error.
我当时想正则表达式生成器可能能够做到这一点.因此,如果输入"crazy",它将生成此正则表达式:
I was thinking a regex generator might be able to do it. So given an input of "crazy" it would generate this regex:
.*((crazy)|(.+razy)|(c.+azy)|cr.+zy)|(cra.+y)|(craz.+)).*
然后它将返回该单词的所有匹配项或该单词的变体形式.
It would then return all matches for that word or variations of that word.
如何构建生成器:我可能会把搜索字符串/单词分成一个字符数组,然后构建一个regex表达式,对新创建的数组进行foreach替换,将键值(字符串中字母的位置)替换为.+".
How to build the generator:I would probably split the search string/word up into an array of characters and build the regex expression doing a foreach the newly created array replacing the key value (the position of the letter in the string) with ".+".
这是进行模糊文本搜索的好方法还是有更好的方法?什么样的字符串比较会根据其接近程度给我一个分数呢?我正在尝试查看一些转换较差的OCR文本是否包含一个简短的单词.
Is this a good way to do fuzzy text search or is there a better way? What about some kind of string comparison that gives me a score based on how close it is? I'm trying to see if some badly converted OCR text contains a word in short.
推荐答案
当您不知道正确的单词是什么时,字符串距离函数是无用的.我建议使用pspell函数:
String distance functions are useless when you don't know what the right word is. I'd suggest pspell functions:
$p = pspell_new("en");
print_r(pspell_suggest($p, "crazzy"));
http://www.php.net/manual/zh/function.pspell-suggest.php
这篇关于模糊文本搜索:正则表达式通配符搜索生成器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!