我正在尝试通过Lucene5.3.0获取反向索引。当我尝试使用下面的代码获取特定术语职位的职位时,我发现dpe(DocsAndPositionsEnum)为空。下面是实现。
在函数createIndex中的实现:
FieldType myFieldType = new FieldType(TextField.TYPE_STORED);
myFieldType.setStoreTermVectors(true);
myFieldType.storeTermVectorPositions();
document.add(new Field("contents", content, myFieldType));
函数showIndex中的代码段:
Document doc = reader.document(docNum);
System.out.println("Processing file:"+doc.get("filename"));
Terms termVector = reader.getTermVector(docNum, "contents");
System.out.println("termVector is null?"+String.valueOf(termVector==null));
TermsEnum itr = termVector.iterator();
BytesRef term = null;
while((term = itr.next()) != null){
try{
DocsAndPositionsEnum dpe = itr.docsAndPositions(null, null);
System.out.println(dpe==null);
int freq = -1;
if (dpe != null) freq = dpe.freq();
System.out.println(freq);
for (int fi = 0; fi< freq; fi++){
final int position = dpe.nextPosition();
System.out.println("position: "+ String.valueOf(position));
}
String termText = term.utf8ToString();
Term termInstance = new Term("contents",term);
long termFreq = reader.totalTermFreq(termInstance);
long docCount = reader.docFreq(termInstance);
System.out.println("term: "+termText+", termFreq = "+termFreq+", docCount = "+docCount);
}catch(Exception e){
System.out.println(e);
}
}
您能帮我解决这个问题吗?
万分谢意!
最佳答案
storeTermVectorPositions
是吸气剂,必须在setStoreTermVectorPositions
函数中使用createIndex
:
myFieldType.setStoreTermVectorPositions(true);
通常,您必须分两个级别遍历
DocsAndPositionsEnum
,首先是文档,然后是位置,但是由于它是术语向量,因此只会有一个文档。您仍然必须定位DocsAndPositionsEnum
才能访问位置和频率:DocsAndPositionsEnum dpe = itr.docsAndPositions(null, null);
int docId = dpe.nextDoc();
assert docId == docNum;
int freq = dpe.freq();
最后,
DocsAndPositionsEnum
是已弃用的API,最好使用PostingsEnum
代替。 API是相同的:org.apache.lucene.index.PostingsEnum dpe = itr.postings(null);
dpe.nextDoc();
dpe.freq();
dpe.nextPosition();
关于java - 如何获得Lucene(5.3)生成的特定术语的位置,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/34370784/