lucene短语查询不起作用

lucene短语查询不起作用

本文介绍了lucene短语查询不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Lucene 2.9.4编写一个简单的程序,该程序可以搜索词组查询,但是我得到了0次点击

I am trying to write a simple program using Lucene 2.9.4 which searches for a phrase query but I am getting 0 hits

public class HelloLucene {

public static void main(String[] args) throws IOException, ParseException{
    // TODO Auto-generated method stub

    StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_29);
    Directory index = new RAMDirectory();

    IndexWriter w = new IndexWriter(index,analyzer,true,IndexWriter.MaxFieldLength.UNLIMITED);
    addDoc(w, "Lucene in Action");
    addDoc(w, "Lucene for Dummies");
    addDoc(w, "Managing Gigabytes");
    addDoc(w, "The Art of Computer Science");
    w.close();

    PhraseQuery pq = new PhraseQuery();
    pq.add(new Term("content", "lucene"),0);
    pq.add(new Term("content", "in"),1);
    pq.setSlop(0);

    int hitsPerPage = 10;
    IndexSearcher searcher = new IndexSearcher(index,true);
    TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
    searcher.search(pq, collector);
    ScoreDoc[] hits = collector.topDocs().scoreDocs;

    System.out.println("Found " + hits.length + " hits.");
    for(int i=0; i<hits.length; i++){
        int docId = hits[i].doc;
        Document d = searcher.doc(docId);
        System.out.println((i+1)+ "." + d.get("content"));
    }

    searcher.close();


}

public static void addDoc(IndexWriter w, String value)throws IOException{
    Document doc = new Document();
    doc.add(new Field("content", value, Field.Store.YES, Field.Index.NOT_ANALYZED));
    w.addDocument(doc);
}

}

请告诉我有什么问题.我也曾尝试使用QueryParser,如下所示

Please tell me what is wrong. I have also tried using QueryParser as following

String querystr ="\"Lucene in Action\"";

    Query q = new QueryParser(Version.LUCENE_29, "content",analyzer).parse(querystr);

但这也不起作用.

推荐答案

代码有两个问题(与您的Lucene版本无关)

There are two issues with the code (and they have nothing to do with your version of Lucene):

1)StandardAnalyzer不会索引停用词(如"in"),因此PhraseQuery将永远无法找到短语"Lucene in"

1) the StandardAnalyzer does not index stopwords (like "in"), so the PhraseQuery will never be able to find the phrase "Lucene in"

2),如Xodarap和Shashikant Kore所述,您创建文档的调用需要包含Index.ANALYZED,否则Lucene在文档的此部分不使用分析器.使用Index.NOT_ANALYZED可能有一种很不错的方法,但是我对此并不熟悉.

2) as mentioned by Xodarap and Shashikant Kore, your call to create a document needs to include Index.ANALYZED, otherwise Lucene does not use the Analyzer on this section of the Document. There's probably a nifty way to do it with Index.NOT_ANALYZED, but I'm not familiar with it.

为简便起见,请将您的addDoc方法更改为:

For an easy fix, change your addDoc method to:

public static void addDoc(IndexWriter w, String value)throws IOException{
    Document doc = new Document();
    doc.add(new Field("content", value, Field.Store.YES, Field.Index.ANALYZED));
    w.addDocument(doc);
}

并将PhraseQuery的创建修改为:

and modify your creation of the PhraseQuery to:

    PhraseQuery pq = new PhraseQuery();
    pq.add(new Term("content", "computer"),0);
    pq.add(new Term("content", "science"),1);
    pq.setSlop(0);

这将为您提供以下结果,因为计算机"和科学"都不是停用词:

This will give you the result below since both "computer" and "science" are not stopwords:

    Found 1 hits.
    1.The Art of Computer Science

如果要查找"Lucene in Action",则可以增加此PhraseQuery的斜率(增加两个词之间的间隙"):

If you want to find "Lucene in Action", you can increase the slop of this PhraseQuery (increasing the 'gap' between the two words):

    PhraseQuery pq = new PhraseQuery();
    pq.add(new Term("content", "lucene"),0);
    pq.add(new Term("content", "action"),1);
    pq.setSlop(1);

如果您确实要搜索"lucene in"一词,则需要选择其他分析器(例如 SimpleAnalyzer ).在Lucene 2.9中,只需将对StandardAnalyzer的调用替换为:

If you really want to search for the sentence "lucene in", you will need to select a different analyzer (like the SimpleAnalyzer). In Lucene 2.9, just replace your call to the StandardAnalyzer with:

    SimpleAnalyzer analyzer = new SimpleAnalyzer();

或者,如果您使用的是3.1版或更高版本,则需要添加版本信息:

Or, if you're using version 3.1 or higher, you need to add the version information:

    SimpleAnalyzer analyzer = new SimpleAnalyzer(Version.LUCENE_35);

这是有关类似问题的有用文章(这将帮助您开始使用PhraseQuery):使用Lucene进行精确短语搜索吗?-请参阅WhiteFang34的答案.

Here is a helpful post on a similar issue (this will help you get going with PhraseQuery):Exact Phrase search using Lucene? -- see WhiteFang34's answer.

这篇关于lucene短语查询不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 21:28