本文介绍了Lucene的表现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
您能否就lucene性能应遵循的步骤提出建议。特别是大数据(大约1TB的pdf文件要编入索引)
could you please suggest on the steps to be followed for lucene performance. especially with large data (around 1TB of pdf files to be indexed)
推荐答案
- 阅读。
- 从Lucene定义您的需求(例如:您正在索引PDF - 您是否需要存储全文,只是为了使其可搜索,或者根本不存在?)
- 进行小规模实验 - 索引一些文档,看看检索是否足够好。
- 尝试索引整个文档(考虑到文章快速索引的提示和索引检索速度) - 检索是否足够好?性能是否足够好?
- 迭代。
- Read Scaling Lucene and Solr.
- Define your needs from Lucene (for example: you are indexing PDFs - do you need to store the full text, just to make it searchable, or not at all?)
- Make a small-scale experiment - index a few documents, see whether retrieval is good enough.
- Try to index the whole thing (considering the paper's tips for quick indexing and for indexing for retrieval speed) - Is retrieval good enough? Is performance good enough?
- Iterate.
这篇关于Lucene的表现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!