问题描述
我正在尝试将我的代码从 Lucene 3.4 更新到 4.1.我想出了除了一个之外的变化.我有需要迭代一个字段的所有术语值的代码.在 Lucene 3.1 中,有一个 IndexReader#terms() 方法提供了一个 TermEnum,我可以对其进行迭代.对于 Lucene 4.1,这似乎已经发生了变化,即使在文档中搜索了几个小时后,我也无法弄清楚如何进行.有人可以指出我正确的方向吗?
I'm trying to update my code from Lucene 3.4 to 4.1. I figured out the changes except one. I have code which needs to iterate over all term values for one field. In Lucene 3.1 there was an IndexReader#terms() method providing a TermEnum, which I could iterate over. This seems to have changed for Lucene 4.1 and even after several hours of search in the documentation I am not able to figure out how. Can someone please point me in the right direction?
谢谢.
推荐答案
请关注 Lucene 4迁移指南::
您获取枚举的方式已更改.主要入口点是Fields
类.如果您知道您的读者是单段读者,请执行以下操作这个:
Fields fields = reader.Fields();
if (fields != null) {
...
}
如果读者可能是多段的,你必须这样做:
If the reader might be multi-segment, you must do this:
Fields fields = MultiFields.getFields(reader);
if (fields != null) {
...
}
fields
可以是 null
(例如,如果阅读器没有字段).
The fields
may be null
(eg if the reader has no fields).
请注意,MultiFields
方法会降低性能MultiReaders
,因为它必须动态合并术语/文档/位置.它是通常最好改为获得顺序阅读器(使用oal.util.ReaderUtil
),然后自己逐步浏览这些阅读器,如果可以的话(这就是 Lucene 驱动搜索的方式).
Note that the MultiFields
approach entails a performance hit on MultiReaders
, as it must merge terms/docs/positions on the fly. It's generally better to instead get the sequential readers (use oal.util.ReaderUtil
) and then step through those readers yourself, if you can (this is how Lucene drives searches).
如果您将 SegmentReader
传递给 MultiFields.fields
它只会返回 reader.fields()
,因此在这种情况下不会影响性能.
If you pass a SegmentReader
to MultiFields.fields
it will simply return reader.fields()
, so there is no performance hit in that case.
一旦你有一个非空的字段,你可以这样做:
Once you have a non-null Fields you can do this:
Terms terms = fields.terms("field");
if (terms != null) {
...
}
terms
可以是 null
(例如,如果该字段不存在).
The terms
may be null
(eg if the field does not exist).
一旦你有一个非null
术语,你就可以得到一个像这样的枚举:
Once you have a non-null
terms you can get an enum like this:
TermsEnum termsEnum = terms.iterator();
返回的TermsEnum
不会为空.
然后你可以通过TermsEnum
这篇关于如何在 Lucene 4 中获取 Lucene 字段的所有术语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!