

如果我要检索[email protected],搜索dave将像[email protected]那样工作。

If I'm trying to retrieve [email protected], searching "dave" will work as will "[email protected]".

但搜索dave @ gmail将无法正常工作。查询发生在Java servlet中。

But searching for "dave@gmail" won't work. The query takes place inside a Java servlet.I believe that the problem may lie with the full stop splitting

我如何解决这个问题,以便dave @ gmail将返回dave @ gmail。 COM?电子邮件地址还可能包含其他域名(如.co.uk)

How can I fix this so that "dave@gmail" will return "[email protected]"? Email addresses may also contain other domains (like .co.uk)




Lucene uses 'Analysers' to tokenise and index your documents. Likewise, analysers are used to tokenise the user search query.

一个常见的错误是使用不同的分析器进行索引而不是搜索,两者都必须匹配您才能获得您期望的结果(搜索常见错误) 。

A common mistake is to use a different analyser for indexing than for searching, both must match for you to get the results you expect (search this doc for "common mistake").


The standard lucene tokeniser recognses email strings and indexes them as one token.

它将索引[email protected]作为[token:[email protected]]。但是,您正在使用的分析器可能会对查询进行令牌化(或者手动构建查询),将其分解为3个令牌,以非字母数字字符分割。所以你可能会搜索3个相邻的令牌:[tok1:dave] [tok2:gmail] [tok3:com],它们不存在。

It will index [email protected] as [token:[email protected]]. However, it's possible that the analyser you are using to tokenise your query (or if you are constructing the query manually) is breaking it up into 3 tokens, splitting at the non alpha-numeric characters. So you might be searching for 3 adjacent tokens: [tok1:dave] [tok2:gmail] [tok3:com], which don't exist.


Query.toString will probably "pretty print" the Query you are submitting to Lucene which may help you debug.


09-27 02:21