Solr/Lucene fieldCache OutOfMemory 对动态字段的错误排序

本文介绍了Solr/Lucene fieldCache OutOfMemory 对动态字段的错误排序的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们有一个 Solr 核心，它有大约 250 个 TrieIntField(声明为 dynamicField).我们的 Solr 索引中有大约 1400 万个文档，许多文档在其中的许多领域都有一定的价值.我们需要在一段时间内对所有这 250 个字段进行排序.

We have a Solr core that has about 250 TrieIntFields (declared as dynamicField). There are about 14M docs in our Solr index and many documents have some value in many of these fields. We have a need to sort on all of these 250 fields over a period of time.

我们面临的问题是底层 lucene fieldCache 很快被填满.我们有一个 4 GB 的盒子，索引大小是 18 GB.在对 40 或 45 个这些动态字段进行排序后，内存消耗约为 90%，我们开始收到 OutOfMemory 错误.

The issue we are facing is that the underlying lucene fieldCache gets filled up very quickly. We have a 4 GB box and the index size is 18 GB. After a sort on 40 or 45 of these dynamic fields, the memory consumption is about 90% and we start getting OutOfMemory errors.

现在，如果消耗的总内存超过 80%，我们每分钟都会运行一个 cron 作业，重启 tomcat.

For now, we have a cron job running every minute restarting tomcat if the total memory consumed is more than 80%.

从我读到的内容来看，我知道限制可排序 Solr 字段上不同值的数量会降低 fieldCache 空间.这些可排序字段中的值可以是 0 到 33000 之间的任何整数，并且分布非常广泛.我们考虑了一些扩展解决方案，但处理整个问题的最佳方法是什么?

From what I have read, I understand that restricting the number of distinct values on sortable Solr fields will bring down the fieldCache space. The values in these sortable fields can be any integer from 0 to 33000 and quite widely distributed. We have a few scaling solutions in mind, but what is the best way to handle this whole issue?

更新:我们认为不是排序，如果我们确实提升它不会去 fieldCache.所以不要发出像

UPDATE: We thought instead of sorting, if we did boosting it won't go to fieldCache. So instead of issuing a query like

select?q=name:alba&sort=relevance_11 desc

我们试过了

select?q={!boost related_11}name:alba

但不幸的是，提升也会填充字段缓存:(

but unfortunately boosting also populates the field cache :(

about

Solr/Lucene fieldCache OutOfMemory 对动态字段的错误排序

问题描述

推荐答案