问题描述
我从同一个中间件运行多个内容/设计单独的网站,我想使用Solr作为搜索引擎。这些网站的域名不同,但内部结构不同(意思是说,实际的数据库和数据结构在网站之间是相同的)。现在的问题是 - 存储更好吗?该站点数据在单个Solr索引中,然后通过site字段分隔,或者在每个站点的单个JVM中使用单独的Solr内核?
最佳性能(没有跨站点查询)?什么会提供最好的召回和精确度(我担心由于IDF因素导致的精确度损失 - 内容领域的差异非常大)? 解决方案
我从同一个中间件运行多个内容/设计单独的网站,我想使用Solr作为搜索引擎。这些网站的域名不同,但内部结构不同(意思是说,实际的数据库和数据结构在网站之间是相同的)。现在的问题是 - 存储更好吗?该站点数据在单个Solr索引中,然后通过site字段分隔,或者在每个站点的单个JVM中使用单独的Solr内核?
最佳性能(没有跨站点查询)?什么会提供最好的召回和精确度(我担心由于IDF因素导致的精确度损失 - 内容领域的差异非常大)? 解决方案
我假设您更担心您的网站成长时会发生什么情况。国际海事组织,多核心似乎是一个更好的选择。
单一大型索引:所有更新和查询都会影响到一个点。当它开始变慢时,您必须通过分片或复制来创建集群以存储您的大型索引。这是一个单一的失败点。备份索引将非常困难。
多个核心:如果一个网站正在增长并使其他网站变矮,您可以轻松将其迁移到其他服务器,以确保没有服务器过载。备份各个站点将相对简单。
当您有非繁忙的站点时,多个内核将使您的生活更简单。随着网站的增长,您可以延迟群集和性能调整,直到后期。
I'm running multiple content/design separate websites from same middleware and I want to use Solr as a search engine. The sites differ in domain but not in internal structure (meaning, the actual database and datastructures are identical between the sites).
The question now is - is it better to store that site data in single Solr index and then separate it by a "site" field, or use a separate Solr core within a single JVM for each site?
What will provide the best performance (there are no cross-site queries)? What will provide the best recall and precision (I'm worried about loss of precision because of IDF factors - differences in content domains are quite large)?
I assume you are more worried about what happens when your sites grow. IMO, multiple cores seems a better choice.
Single large index: All updates and queries impinge upon a single point. When it starts getting slow, you must make a cluster by sharding or replication to store your large index. And it's a single point of failure. Backing up the index will be tough.
Multiple cores: If one site is growing and dwarfing others, you can easily migrate it to a different server, ensuring that no servers are overloaded. Backing up individual sites will be relatively trivial.
Multiple cores will make your life simpler when you have un-busy sites. As your sites grows, you can put off clustering and performance tuning until later.
这篇关于在单核或多核上存储多组文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!