问题描述
我们有一个 Solr 核心,它有大约 250 个 TrieIntField
(声明为 dynamicField
).我们的 Solr 索引中有大约 1400 万个文档,许多文档在其中的许多领域都有一定的价值.我们需要在一段时间内对所有这 250 个字段进行排序.
We have a Solr core that has about 250 TrieIntField
s (declared as dynamicField
). There are about 14M docs in our Solr index and many documents have some value in many of these fields. We have a need to sort on all of these 250 fields over a period of time.
我们面临的问题是底层 lucene fieldCache
很快被填满.我们有一个 4 GB 的盒子,索引大小是 18 GB.在对 40 或 45 个这些动态字段进行排序后,内存消耗约为 90%,我们开始收到 OutOfMemory 错误.
The issue we are facing is that the underlying lucene fieldCache
gets filled up very quickly. We have a 4 GB box and the index size is 18 GB. After a sort on 40 or 45 of these dynamic fields, the memory consumption is about 90% and we start getting OutOfMemory errors.
现在,如果消耗的总内存超过 80%,我们每分钟都会运行一个 cron 作业,重启 tomcat.
For now, we have a cron job running every minute restarting tomcat if the total memory consumed is more than 80%.
从我读到的内容来看,我知道限制可排序 Solr 字段上不同值的数量会降低 fieldCache
空间.这些可排序字段中的值可以是 0 到 33000 之间的任何整数,并且分布非常广泛.我们考虑了一些扩展解决方案,但处理整个问题的最佳方法是什么?
From what I have read, I understand that restricting the number of distinct values on sortable Solr fields will bring down the fieldCache
space. The values in these sortable fields can be any integer from 0 to 33000 and quite widely distributed. We have a few scaling solutions in mind, but what is the best way to handle this whole issue?
更新:我们认为不是排序,如果我们确实提升它不会去 fieldCache.所以不要发出像
UPDATE: We thought instead of sorting, if we did boosting it won't go to fieldCache. So instead of issuing a query like
select?q=name:alba&sort=relevance_11 desc
我们试过了
select?q={!boost related_11}name:alba
但不幸的是,提升也会填充字段缓存:(
but unfortunately boosting also populates the field cache :(
推荐答案
我认为你有两个选择:
1) 添加更多内存.
2) 通过指定 facet.method=enum
, 根据文档.
1) Add more memory.
2) Force Solr not to use the field cache by specifying facet.method=enum
, as per documentation.
还有一个 solr-user邮件列表线程讨论相同的问题.
There's also a solr-user mailing list thread discussing the same problem.
除非您的索引很大,否则我会选择选项 1).现在 RAM 很便宜.
Unless your index is huge, I'd go with option 1). RAM is cheap these days.
这篇关于Solr/Lucene fieldCache OutOfMemory 对动态字段的错误排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!