我想使用Lucene 6.1.0,使用Soundex或任何适用于葡萄牙语的算法来实现语音搜索.我在互联网上发现了许多不完整的示例,他们在教如何实现自定义标记器,分析器,但是似乎这些示例中使用的抽象类在6.1.0版本中是不同的.谁能指出我在哪里可以找到一个很好的文档 Lucene,而不仅仅是 java 文档,而没有任何进一步的文档教如何将这些东西放在一起?
I want to implement a phonetic search using Lucene 6.1.0., using Soundex or any suitable algorithm for Portuguese. I found many incomplete examples over internet, teaching how to implement a custom tokenizer, analyzer, but it seems that the abstract classes used on those exapmples are not the same in the version 6.1.0. Can anyone point me out where I can find a good documentation an Lucene, not just java docs without any further documentation teaching how to put the things together?
The Analyzer documentation shows how to create your analyzer.
对于语音分析,您应该查看 org.apache.lucene.analysis.phonetic 软件包(您需要在构建路径中添加"lucene-analyzers-phonetic-6.1.0.jar",以及Apache的"commons-codec-1.10.jar")可以到达此处).
For phonetic analysis, you should look to the org.apache.lucene.analysis.phonetic package (You'll need to add "lucene-analyzers-phonetic-6.1.0.jar" to your build path, as well as Apache's "commons-codec-1.10.jar", which you can get here).
Then you can setup your analyzer something like, for instance:
Analyzer analyzer = new Analyzer() {
protected TokenStreamComponents createComponents(String fieldName) {
Tokenizer tokenizer = new StandardTokenizer();
TokenStream stream = new DoubleMetaphoneFilter(tokenizer, 6, false);
return new TokenStreamComponents(tokenizer, stream);