问题描述
我正在创建Lucene 4.10.3索引.
I'm creating a Lucene 4.10.3 index.
我正在使用他的StandardAnalyzer.
I am using he StandardAnalyzer.
String indexpath="C:\\TEMP";
IndexWriterConfig iwc=newIndexWriterConfig(Version.LUCENE_4_10_3,new StandardAnalyzer(CharArraySet.EMPTY_SET));
Directory dir = FSDirectory.open(new File(indexpath));
IndexWriter indexWriter = new IndexWriter(dir, iwc);
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
Document doc = new Document();
doc.add(new TextField("city", "ANDHRA",Store.YES));
doc.add(new TextField("city", "ANDHRA PRADESH",Store.YES));
doc.add(new TextField("city", "ASSAM AND NAGALAND",Store.YES));
doc.add(new TextField("city", "ASSAM",Store.YES));
doc.add(new TextField("city", "PUNJAB",Store.YES));
doc.add(new TextField("city", "PUNJAB AND HARYANA",Store.YES));
indexWriter.addDocument(doc);
当我尝试使用短语查询来搜索Lucene索引时
when I try to search in lucene index using phrase query
例如
try {
QueryBuilder build=new QueryBuilder(new KeywordAnalyzer());
Query q1=build.createPhraseQuery("city","ANDHRA");
Directory dir = FSDirectory.open(new File("C:\\TEMP"));
DirectoryReader indexReader = DirectoryReader.open(dir);
IndexSearcher searcher = new IndexSearcher(indexReader);
ScoreDoc hits[] = searcher.search(q1,10).scoreDocs;
Set<String> set=new HashSet<String>();
set.add("city");
for (int i=0; i < hits.length; i++) {
Document document = indexReader.document(hits[i].doc,set);
System.out.println(document.get("city"));
}
} catch (IOException e) {
e.printStackTrace();
}
我们得到的结果如下-
ANDHRA
安德烈·普拉德什
当我搜索"ANDHRA"时,如何仅获得"ANDHRA"结果,不是"ANDHRA PRADESH",如何使用StandardAnalyzer匹配lucene中的整个字段值?
When I am searching for "ANDHRA" how to get only "ANDHRA" result,not "ANDHRA PRADESH", how to match entire field value in lucene by using StandardAnalyzer?
推荐答案
如果要匹配字段的准确,未修改和未标记的值,则根本不应该对其进行分析.只需使用 StringField
而不是 TextField
.
If you want to match the exact, unmodified and untokenized, value of the field, you shouldn't be analyzing it at all. Simply use a StringField
instead of TextField
.
如果您要进行一些分析(例如,使用小写字母或类似的文字),但不进行标记化,则可以使用.
If you want some analysis (ie. lowercasing, or some such), but without tokenizing, you can use KeywordTokenizer
in your Analyzer
implementation for that.
如果要使用 QueryParser
创建查询,请注意解析器如何使用空格分隔查询子句.您可能会发现有必要编写如下查询: city:ANDHRA \ PRADESH
(我不相信 QueryParser.escape
会为您完成此操作)
If you are using a QueryParser
to create your queries, be aware of how the the parser uses spaces to separate query clauses. You may find it necessary to write queries like: city:ANDHRA\ PRADESH
(I do not believe QueryParser.escape
will do this for you).
这篇关于匹配lucene整个字段的精确值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!