本文介绍了Lucene的表现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您能否就lucene性能应遵循的步骤提出建议。特别是大数据(大约1TB的pdf文件要编入索引)

could you please suggest on the steps to be followed for lucene performance. especially with large data (around 1TB of pdf files to be indexed)

推荐答案


  1. 阅读。

  2. 从Lucene定义您的需求(例如:您正在索引PDF - 您是否需要存储全文,只是为了使其可搜索,或者根本不存在?)

  3. 进行小规模实验 - 索引一些文档,看看检索是否足够好。

  4. 尝试索引整个文档(考虑到文章快速索引的提示和索引检索速度) - 检索是否足够好?性能是否足够好?

  5. 迭代。

  1. Read Scaling Lucene and Solr.
  2. Define your needs from Lucene (for example: you are indexing PDFs - do you need to store the full text, just to make it searchable, or not at all?)
  3. Make a small-scale experiment - index a few documents, see whether retrieval is good enough.
  4. Try to index the whole thing (considering the paper's tips for quick indexing and for indexing for retrieval speed) - Is retrieval good enough? Is performance good enough?
  5. Iterate.

这篇关于Lucene的表现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-29 11:21