问题描述
由于我的查询未得到答复,因此再次发布此问题.
我正在使用Lucene开发图书搜索API.用户可以搜索标题或描述字段包含C.F.A ...的书籍我正在使用StandardAnalyzer以及停用词列表.
Am使用MultiFieldQueryParser解析上面的字符串,但是在解析之后,它删除了字符串中的点.我在这里想念什么?
谢谢.
正如您提到的,这是此问题的重复形式.我建议您至少在您的问题中添加指向它的链接.另外,我建议您创建一个用户帐户,因为目前无法查看您的旧问题来获取上下文.
.
这是我编写的用于与StandardAnalyzer一起玩的一些代码:
StringReader testReader = new StringReader("C.F.A. C.F.A word"); StandardAnalyzer analyzer = new StandardAnalyzer(); TokenStream tokenStream = analyzer.tokenStream("title", testReader); System.out.println(tokenStream.next()); System.out.println(tokenStream.next()); System.out.println(tokenStream.next());
顺便说一句,这的输出是:
(cfa,0,6,type=<ACRONYM>) (c.f.a,7,12,type=<HOST>) (word,13,17,type=<ALPHANUM>)
例如,请注意,如果首字母缩写词不以点号结尾,则分析器会假定它是互联网主机名,因此搜索"C.F.A"将与"C.F.A."不匹配.在文本中.
Am posting this question again as my query is not answered.
Am working on a book search api using Lucene.User can search for a book whose title or description field contains C.F.A...Am using StandardAnalyzer alongwith a list of stop words.
Am using MultiFieldQueryParser for parsing above string.But after parsing, its removing the dots in the string. What am i missing here?
Thanks.
As you mentioned, this is a dupe of this question. I suggest you at least add a link to it in your question. Also, I would urge you to create a user account, since right now it's not possible to look at your old question to get context.
The StandardAnalyzer specifically handles acronyms, and converts C.F.A. (for example) to cfa. This means you should be able to do the search, as long as you make sure you use the same analyzer for the indexing and for the query parsing.
I would suggest you run some more basic test cases to eliminate other factors. Try to user an ordinary QueryParser instead of a multi-field one.
Here's some code I wrote to play with the StandardAnalyzer:
StringReader testReader = new StringReader("C.F.A. C.F.A word"); StandardAnalyzer analyzer = new StandardAnalyzer(); TokenStream tokenStream = analyzer.tokenStream("title", testReader); System.out.println(tokenStream.next()); System.out.println(tokenStream.next()); System.out.println(tokenStream.next());
The output for this, by the way was:
(cfa,0,6,type=<ACRONYM>) (c.f.a,7,12,type=<HOST>) (word,13,17,type=<ALPHANUM>)
Note, for example, that if the acronym doesn't end with a dot then the analyzer assumes it's an internet host name, so searching for "C.F.A" will not match "C.F.A." in the text.
这篇关于MultiFieldQueryParser正在从首字母缩写词中删除点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!