本文介绍了在哪里可以找到Apache Lucene/Solr的性能基准的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在大型数据集上,是否存在指向Lucene/Solr性能基准的链接/资源.超过500GB〜5TB范围的数据集

Are there any links/resources towards performance benchmarks for Lucene/Solr on large datasets. Data sets above the range of 500GB ~ 5TB

谢谢

推荐答案

Lucene提交者Mike McCandless在基准测试上运行定期跟踪性能改进和回归.它们是用Wikipedia出口制成的,可能比您要寻找的要小.

Lucene committer Mike McCandless runs benchmarks on a regular basis to track down performances improvements and regressions. They are made with Wikipedia exports, which might be a little bit smaller than what you are looking for.

但是性能并不完全取决于输入的大小,而是取决于文档的数量和唯一术语.如果您已经拥有一些与需要索引的数据类似的数据,建议您,使其适应您的需求,并使用您自己的数据集和硬件运行它,以尝试找出可以期望的性能数字.

But the performance doesn't depend so much on the input size, but rather on the number of documents and unique terms. If you already have some data similar to what you will need to index, I would recommend you check out Mike's test tool, adapt it to your needs, and run it with your own dataset and hardware to try to find out what kind of performance numbers you can expect.

这篇关于在哪里可以找到Apache Lucene/Solr的性能基准的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-06 03:22