我正在尝试通过Lucene5.3.0获取反向索引。当我尝试使用下面的代码获取特定术语职位的职位时,我发现dpe(DocsAndPositionsEnum)为空。下面是实现。


  在函数createIndex中的实现:


FieldType myFieldType = new FieldType(TextField.TYPE_STORED);
myFieldType.setStoreTermVectors(true);
myFieldType.storeTermVectorPositions();
document.add(new Field("contents", content, myFieldType));



  函数showIndex中的代码段:


Document doc = reader.document(docNum);
System.out.println("Processing file:"+doc.get("filename"));

Terms termVector = reader.getTermVector(docNum, "contents");
System.out.println("termVector is null?"+String.valueOf(termVector==null));
TermsEnum itr = termVector.iterator();
BytesRef term = null;

while((term = itr.next()) != null){
    try{
        DocsAndPositionsEnum dpe = itr.docsAndPositions(null, null);
        System.out.println(dpe==null);
        int freq = -1;
        if (dpe != null) freq = dpe.freq();
        System.out.println(freq);
        for (int fi = 0; fi< freq; fi++){
            final int position = dpe.nextPosition();
            System.out.println("position: "+ String.valueOf(position));
        }

        String termText = term.utf8ToString();
        Term termInstance = new Term("contents",term);
        long termFreq = reader.totalTermFreq(termInstance);
        long docCount = reader.docFreq(termInstance);

        System.out.println("term: "+termText+", termFreq = "+termFreq+", docCount = "+docCount);
    }catch(Exception e){
        System.out.println(e);
}


}

您能帮我解决这个问题吗?
万分谢意!

最佳答案

storeTermVectorPositions是吸气剂,必须在setStoreTermVectorPositions函数中使用createIndex

myFieldType.setStoreTermVectorPositions(true);


通常,您必须分两个级别遍历DocsAndPositionsEnum,首先是文档,然后是位置,但是由于它是术语向量,因此只会有一个文档。您仍然必须定位DocsAndPositionsEnum才能访问位置和频率:

DocsAndPositionsEnum dpe = itr.docsAndPositions(null, null);
int docId = dpe.nextDoc();
assert docId == docNum;
int freq = dpe.freq();


最后,DocsAndPositionsEnum是已弃用的API,最好使用PostingsEnum代替。 API是相同的:

org.apache.lucene.index.PostingsEnum dpe = itr.postings(null);
dpe.nextDoc();
dpe.freq();
dpe.nextPosition();

关于java - 如何获得Lucene(5.3)生成的特定术语的位置,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/34370784/

10-11 04:41