什么算法在拼写检查器中给出建议

什么算法在拼写检查器中给出建议

本文介绍了什么算法在拼写检查器中给出建议?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在实现带有单词建议的拼写检查器时通常使用什么算法?

What algorithm is typically used when implementing a spell checker that is accompanied with word suggestions?

起初我认为检查每个输入的新词(如果没有在字典中找到)与Levenshtein distance 与字典中的所有其他单词的距离并返回最高结果.然而,这似乎效率极低,必须反复评估整个字典.

At first I thought it might make sense to check each new word typed (if not found in the dictionary) against it's Levenshtein distance from every other word in the dictionary and returning the top results. However, this seems like it would be highly inefficient, having to evaluate the entire dictionary repeatedly.

这通常是如何完成的?

推荐答案

Peter Norvig 的好文章 如何实现拼写校正器.它基本上是一种尝试具有给定编辑距离的候选字符串的蛮力方法.(这里是一些如何提高拼写校正器性能的提示使用 布隆过滤器更快的候选哈希.)

There is good essay by Peter Norvig how to implement a spelling corrector. It's basicly a brute force approach trying candidate strings with a given edit distance. (Here are some tips how you can improve the spelling corrector performance using a Bloom Filter and faster candidate hashing.)

对拼写检查器的要求较弱.你只需要找出字典里没有一个词.您可以使用 Bloom Filter 来构建一个占用较少内存的拼写检查器.Jon Bentley 在 Programming Pearls 中描述了一个古老的版本,使用 64kb一本英文字典.

The requirements for a spell checker are weaker. You have only to find out that a word is not in the dictionary. You can use a Bloom Filter to build a spell checker which consumes less memory. An ancient versions is decribed in Programming Pearls by Jon Bentley using 64kb for an english dictionary.

BK-Tree 是一种替代方法.一篇不错的文章是这里.

A BK-Tree is an alternative approach. A nice article is here.

Levenshstein 距离并不是拼写检查器的正确编辑距离.它只知道插入、删除和替换.换位丢失并为 1 个字符的换位产生 2 个(它是 1 个删除和 1 个插入).Damerau-Levenshtein 距离是正确的编辑距离.

Levenshstein distance is not exactly the right edit distance for a spell checker. It knows only insertion, deletion and substitution. Transposition is missing and produces 2 for a transposition of 1 character (it's 1 delete and 1 insertion). Damerau–Levenshtein distance is the right edit distance.

这篇关于什么算法在拼写检查器中给出建议?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-10 22:02