问题描述
我的理解是,在诸如Amazon eCommerce / Google之类的任何可扩展产品中,自动完成/搜索文本/项目都可以在高水平上实现高水平:-
基于弹性搜索(ES)的方法
Elastic Search(ES) based approach
-
文档存储在DB中。一旦持久化给弹性搜索,它就会创建索引并将索引/文档(基于令牌生成器)存储在基于内存或磁盘的
配置中。
Documents are stored in DB . Once persisted given to Elastic search, It creates the index and store the index/document(based on tokenizer) in memory or disk basedconfiguration.
一旦用户类型说3个字符,它将在ES下搜索所有索引(可以配置为甚至对ngram进行索引),根据权重对它们进行排名并返回给用户
Once user types say 3 characters, it search all index under ES(Can be configured to index even ngram) , Rank them based on weightage and return to user
但是在阅读了诸如
看起来有些可扩展产品也使用 Trie
数据结构进行基于前缀的搜索。
Looks some of the scalable product also uses Trie
data stucture to do the prefix based search.
我的问题是,基于trie的方法可以很好地替代ES,还是ES内部使用 Trie
还是我完全错过了?在这里?
My question Is Can trie based approach be good alternative to ES or ES internally uses Trie
or am i missing completely here ?
推荐答案
ES自动补全可以通过两种方式实现:
ES autocompletion can be achieved in two ways:
- 使用
- 使用
- 或使用
- using
prefix
queries - either using (edge-)ngrams
- or using the completion suggester
第一个选择是穷人的修养功能。我之所以提及它是因为它在某些情况下很有用,但是如果您有大量文档,则应避免使用它。
The first option is the poor man's completion feature. I'm mentioning it because it can be useful in certain situation but you should avoid it if you have a substantial amount of documents.
第二个选项使用常规的ES索引功能,即它将标记文本,将对所有(edge-)ngram进行索引,然后您可以搜索已索引的任何前缀/中缀/后缀。
The second option uses the conventional ES indexing features, i.e. it will tokenize the text, all (edge-)ngrams will be indexed and then you can search for any prefix/infix/suffix that have been indexed.
第三个选项使用了不同的方法,并针对速度进行了优化。基本上,当索引类型为 completion
的字段时,ES将创建并将其存储在内存中以进行超快速访问。
The third option uses a different approach and is optimized for speed. Basically, when indexing a field of type completion
, ES will create a "finite state transducer" and store it in memory for ultra fast access.
有限状态传感器在术语上接近特里实施。您可以查看,该文章显示了与
A finite state transducer is close to a trie in terms of implementation. You can check this excellent article which shows how trie compares to finite state transducer
更新(2019年6月25日):
ES 7.2引入了一种称为 search_as_you_type
的新数据类型,该数据类型本身就允许这种行为。有关更多信息,请访问:
ES 7.2 introduced a new data type called search_as_you_type
that allows this kind of behavior natively. Read more at: https://www.elastic.co/guide/en/elasticsearch/reference/7.2/search-as-you-type.html
这篇关于弹性搜索还是Trie搜索/自动完成?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!