本文介绍了如何在产品搜索中发现拼写错误并提出可能的更正建议?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个非常庞大的产品名称数据库,您如何在用户搜索中检测出可能的错字并提出可能的更正(如Google展示金达的方式)?

Given a very large database of product names, how would you detect possible typos in user searches and suggest possible corrections (Kinda like the way Google presents them)?

例如,

用户输入叉子把手并按搜索。

User enters "fork handels" and presses 'search'.

他们回来了

没有结果。您是说'叉子把手'吗?

"No results. Did you mean 'fork handles'?"

推荐答案

有解决此问题的几种方法:

There are several approaches for this problem:


  1. 在数据库中保留最受欢迎的拼写错误表。如果您需要一些常见的拼写错误:)

  2. 使用基于 的算法:信息中从理论和计算机科学的角度来看,两个字符串之间的编辑距离是将一个字符串转换为另一个字符串所需的操作数。有几种不同的算法可以定义或计算该指标。例如,请阅读。

  3. 如果您使用Lucene进行全文搜索,请,其中显示了如何实现您要表达的意思功能。

  4. 如果您将该功能视为简单的拼写校正,那么这里有一些不错的选择,以几种语言编写的非常简短的实现:

  1. Keeping a table of most popular misspellings in your database. If you need some common misspellings: here)
  2. Using an algorithm based on the edit distance: In information theory and computer science, the edit distance between two strings of characters is the number of operations required to transform one of them into the other. There are several different algorithms to define or calculate this metric. Read the Wikipedia article for the Levenshtein algorithm for example.
  3. If you are using Lucene for full text search, here is a nice article which shows how to implement the "Did you mean" feature.
  4. If you see that feature as simple spell-correction, here are some nice, very short implementations in several languages: How to Write a Spelling Corrector

这篇关于如何在产品搜索中发现拼写错误并提出可能的更正建议?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-13 14:50