我正在尝试使用Lucene 4.10.2在612条记录附近建立索引。它正在索引目录中创建许多CFS文件。大约创建了626个CFS文件。索引需要花费更多时间。所有CFS文件的最大长度为3kb。

ENV:java 8,窗口7



Directory dir = FSDirectory.open(file);
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_4_10_2, new ClassicAnalyzer());
if(bufferSizeMB != 0 && bufferSizeMB != -1){
    config.setRAMBufferSizeMB(bufferSizeMB);
}  else {
    config.setRAMBufferSizeMB(DEFAULT_RAM_BUFFER_SIZE_MB);
}
config.setMaxBufferedDocs(1000);
config.setMaxBufferedDeleteTerms(1000);
config.setMergePolicy(new LogDocMergePolicy());
IndexWriter iwriter = new IndexWriter(dir, config);
iwriter.getConfig().setMaxBufferedDeleteTerms(1000);
iwriter.getConfig().setMaxBufferedDocs(1000);
iwriter.getConfig().setRAMBufferSizeMB(bufferSizeMB)


http://lucene.472066.n3.nabble.com/Multiple-CFS-files-are-generated-in-lucene-4-10-2-td4176336.html

最佳答案

change文档中,

  LUCENE-4462: DocumentsWriter now flushes deletes, segment infos and builds
  CFS files if necessary during segment flush and not during publishing. The latter
  was a single threaded process while now all IO and CPU heavy computation is done
  concurrently in DocumentsWriterPerThread.


使用分段刷新,将根据您的合并策略触发合并。理想的是,此后,如果索引正确结束并且关闭了writer,则仅应保留一个cfs文件。

那就是我在应用程序中观察到的。

更新以回应评论

我最近从2.x迁移到4.10.2。

来自索引编写器4.10.2 documentation的引用。

Commits all pending changes (added & deleted documents, segment merges, added indexes,
etc.) to the index, and syncs all referenced index files, such that a reader will see
the changes and the index updates will survive an OS or machine crash or power loss.
Note that this does not wait for any running background merges to finish. This may
be a costly operation, so you should test the cost in your application and do it only
when really necessary.


您可以做的是改为使用一个索引编写器,并使用该索引添加器添加所有记录,而无需每次都调用commit。最后,当所有记录添加完毕后,只需调用indexwriter.close()即可完成合并和提交过程。

08-03 16:50