问题描述
我正在尝试使用Lucene 2.9.4编写一个简单的程序,该程序可以搜索词组查询,但是我得到了0次点击
I am trying to write a simple program using Lucene 2.9.4 which searches for a phrase query but I am getting 0 hits
public class HelloLucene {
public static void main(String[] args) throws IOException, ParseException{
// TODO Auto-generated method stub
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_29);
Directory index = new RAMDirectory();
IndexWriter w = new IndexWriter(index,analyzer,true,IndexWriter.MaxFieldLength.UNLIMITED);
addDoc(w, "Lucene in Action");
addDoc(w, "Lucene for Dummies");
addDoc(w, "Managing Gigabytes");
addDoc(w, "The Art of Computer Science");
w.close();
PhraseQuery pq = new PhraseQuery();
pq.add(new Term("content", "lucene"),0);
pq.add(new Term("content", "in"),1);
pq.setSlop(0);
int hitsPerPage = 10;
IndexSearcher searcher = new IndexSearcher(index,true);
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
searcher.search(pq, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
System.out.println("Found " + hits.length + " hits.");
for(int i=0; i<hits.length; i++){
int docId = hits[i].doc;
Document d = searcher.doc(docId);
System.out.println((i+1)+ "." + d.get("content"));
}
searcher.close();
}
public static void addDoc(IndexWriter w, String value)throws IOException{
Document doc = new Document();
doc.add(new Field("content", value, Field.Store.YES, Field.Index.NOT_ANALYZED));
w.addDocument(doc);
}
}
请告诉我有什么问题.我也曾尝试使用QueryParser,如下所示
Please tell me what is wrong. I have also tried using QueryParser as following
String querystr ="\"Lucene in Action\"";
Query q = new QueryParser(Version.LUCENE_29, "content",analyzer).parse(querystr);
但这也不起作用.
推荐答案
代码有两个问题(与您的Lucene版本无关)
There are two issues with the code (and they have nothing to do with your version of Lucene):
1)StandardAnalyzer不会索引停用词(如"in"),因此PhraseQuery将永远无法找到短语"Lucene in"
1) the StandardAnalyzer does not index stopwords (like "in"), so the PhraseQuery will never be able to find the phrase "Lucene in"
2),如Xodarap和Shashikant Kore所述,您创建文档的调用需要包含Index.ANALYZED,否则Lucene在文档的此部分不使用分析器.使用Index.NOT_ANALYZED可能有一种很不错的方法,但是我对此并不熟悉.
2) as mentioned by Xodarap and Shashikant Kore, your call to create a document needs to include Index.ANALYZED, otherwise Lucene does not use the Analyzer on this section of the Document. There's probably a nifty way to do it with Index.NOT_ANALYZED, but I'm not familiar with it.
为简便起见,请将您的addDoc方法更改为:
For an easy fix, change your addDoc method to:
public static void addDoc(IndexWriter w, String value)throws IOException{
Document doc = new Document();
doc.add(new Field("content", value, Field.Store.YES, Field.Index.ANALYZED));
w.addDocument(doc);
}
并将PhraseQuery的创建修改为:
and modify your creation of the PhraseQuery to:
PhraseQuery pq = new PhraseQuery();
pq.add(new Term("content", "computer"),0);
pq.add(new Term("content", "science"),1);
pq.setSlop(0);
这将为您提供以下结果,因为计算机"和科学"都不是停用词:
This will give you the result below since both "computer" and "science" are not stopwords:
Found 1 hits.
1.The Art of Computer Science
如果要查找"Lucene in Action",则可以增加此PhraseQuery的斜率(增加两个词之间的间隙"):
If you want to find "Lucene in Action", you can increase the slop of this PhraseQuery (increasing the 'gap' between the two words):
PhraseQuery pq = new PhraseQuery();
pq.add(new Term("content", "lucene"),0);
pq.add(new Term("content", "action"),1);
pq.setSlop(1);
如果您确实要搜索"lucene in"一词,则需要选择其他分析器(例如 SimpleAnalyzer ).在Lucene 2.9中,只需将对StandardAnalyzer的调用替换为:
If you really want to search for the sentence "lucene in", you will need to select a different analyzer (like the SimpleAnalyzer). In Lucene 2.9, just replace your call to the StandardAnalyzer with:
SimpleAnalyzer analyzer = new SimpleAnalyzer();
或者,如果您使用的是3.1版或更高版本,则需要添加版本信息:
Or, if you're using version 3.1 or higher, you need to add the version information:
SimpleAnalyzer analyzer = new SimpleAnalyzer(Version.LUCENE_35);
这是有关类似问题的有用文章(这将帮助您开始使用PhraseQuery):使用Lucene进行精确短语搜索吗?-请参阅WhiteFang34的答案.
Here is a helpful post on a similar issue (this will help you get going with PhraseQuery):Exact Phrase search using Lucene? -- see WhiteFang34's answer.
这篇关于lucene短语查询不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!