本文介绍了自动完成的后端的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一道面试题:设计一个自动完成的分布式后端.

This is an interview question: design a distributed back-end for auto-complete.

我会回答如下:

自动完成是通过给定的后缀在字典中进行搜索.字典可能应该组织为 trie.字典是根据最频繁的查询构建的,但这是另一回事.

Auto-complete is a search in a dictionary by a given suffix. The dictionary should be probably organized as a trie. The dictionary is built from the most frequent queries but it's another story.

现在我假设字典不会经常更改(例如每天一次而不是每毫秒一次).因此,我们可以在多个处理自动完成查询的服务器上复制字典(例如,使用负载平衡器和循环策略).

Now I assume the dictionary is not changed frequently (e.g. once a day rather than every millisecond). Thus we can just replicate the dictionary across a number of servers that handle auto-complete queries (e.g. with a load balancer and round-robin policy).

我们也应该考虑字典,但这也是另一回事.

We should also think about dictionary but this is also another story.

有意义吗?我错过了什么吗?

Does it make sense? Am I missing something?

推荐答案

看看什么SOLR4.0(solr 有 trie 并且是分布式的).这在很大程度上取决于他们期望自动完成功能的工作方式.如果它只是一个通配符过滤器,那么像trie这样的东西对于简单来说就可以了ASCII ...否则,如果他们想要自动更正,它会变得更加复杂.话虽如此,我怀疑如果 Trie 是一个通用字段(即不是 SKU 或专用 ID),它会给您带来好的结果,否则您将拥有一个非常大且效率低下的 trie.

Take a look at what SOLR 4.0 (solr has trie's and is distributed). Its highly dependent on how they expect the autocomplete to work. If its just a wild card filter than something like a trie will be fine for simple ASCII... otherwise its gets more complicated if they want auto-correction. That being said I doubt a trie will get you good results if its a generic field (ie not a SKU or specialized ID) otherwise you will have a monstrously large and inefficient trie.

看看:

09-22 16:15