StandardTokenizerFactory

StandardTokenizerFactory

本文介绍了不要用solr.StandardTokenizerFactory在下划线上分开的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用solr,我在文本字段中使用StandardTokenizerFactory,但是我不想在下划线上进行分割.我是否必须使用另一个Toknizer(例如PatternTokenizerFactory),或者可以使用StandardTokenizerFactory来做到这一点?因为我需要StandardTokenizerFactory的相同功能,但下划线没有分割.

I'm using solr, I'm using StandardTokenizerFactory in the text field but I don't want to split on the underscore.Do I have to use another toknizer like PatternTokenizerFactory or I can do this with StandardTokenizerFactory ? as I need the same functionality of StandardTokenizerFactory but without split on underscore.

推荐答案

我认为您无法在StandardTokenizerFactory中完成此操作.一种解决方案是先用StandardTokenizerFactory无法处理的下划线替换文档中不包含的下划线.例如,您可以先用 PatternReplaceCharFilterFactory PatternReplaceFilterFactory 替换为QQ_. >.这是执行此操作的fieldType定义:

I don't think you can do it in StandardTokenizerFactory. One solution is to first replace underscores with something the StandardTokenizerFactory won't process and something your documents won't otherwise contain. For example, you can first replace _ with QQ everywhere with PatternReplaceCharFilterFactory and pass through StandardTokenizerFactory and then replace QQ with _ using PatternReplaceFilterFactory. Here is the fieldType definition to do it:

<fieldType name="text_std_prot" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
        <charFilter class="solr.PatternReplaceCharFilterFactory" 
                    pattern="_" 
                    replacement="QQ"/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.PatternReplaceFilterFactory" 
                pattern="QQ" 
                replacement="_"/>
        ...
    </analyzer>
</fieldType>

这是发生的情况的屏幕截图:

And here is a screen shot of what happens:

这篇关于不要用solr.StandardTokenizerFactory在下划线上分开的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-11 16:49