问题描述
在我的分析器链中,ShingleFilter在停用词过滤器之后.如文档,ShingleFilter通过插入填充符(带有termtext"_"的符)来处理位置增量> 1.
In my analyzer chain, ShingleFilter comes after stopword filter. As mentioned in the docs, ShingleFilter handles position increments > 1 by inserting filler tokens (tokens with termtext "_").
For example : "please divide this sentence into biword shingles"
Shingles of size 2 : please divide, divide _, _ sentence, sentence _, _ biword, biword shingles (assuming that "this, "into" are stopwords)
我想用填充标记消除这些带状疱疹,即我所需的输出仅包含:请除以双字带状疱疹.
I would like to eliminate those shingles with the filler tokens, i.e. my desired output contains only: please divide, biword shingles.
我专门研究带状疱疹(最多4克)的刻面.由于这些停用词,对于除法_句子_"
I've a dedicated field for facets with shingles up to 4-grams. Due to these stopwords, all the facet constraints (or values) look useless with those fillers like "divide _ sentence _"
请你指导我.
使用Solr 4.4.
Using Solr 4.4.
更新
我想到了在StopFilter配置中将enablePositionIncrement设置为false.不确定是否可以解决问题,但是Lucene 4.4不再支持.
I thought of setting enablePositionIncrement to false in StopFilter configuration. Not sure whether that solves the problem or not but Lucene 4.4 doesn't support that anymore.
推荐答案
在ShingleFilterFactory
之后,在分析器链中添加PatternReplaceFilterFactory
.将所有包含填充符令牌的令牌替换为空字符串,即".
Add PatternReplaceFilterFactory
in your analyzer chain after ShingleFilterFactory
. Replace all Token containing filler token with empty string i.e. "".
这可能会暂时解决您的问题,但对于永久性解决方案,必须编写自己的分析仪或自定义ShingleFilter.
This may solve your problem temporarily but for permanent solution have to write your own analyzer or customize ShingleFilter.
示例字段类型:
<fieldType name="text_general_shingle" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="3" outputUnigrams="true"/>
<filter class="solr.PatternReplaceFilterFactory" pattern=".*_.*" replacement=""/>
</analyzer>
</fieldType>
这篇关于Lucene分析器链:无填充令牌的ShingleFilter的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!