问题描述
我试图让 Solr 仅提取格式为 n-nnnnnnn 的票证的第二个 7 位数字部分
I'm trying to cause Solr to extract only the second 7 digit portion of a ticket formatted like n-nnnnnnn
本来是希望把全票保留在一起的.根据文档,数字和数字应该放在一起,但是在解决这个问题一段时间并查看代码之后,我认为情况并非如此.Solr 总是生成两个术语.因此,与其对 n 的第一位数字进行大量匹配,我认为我可以仅从第二部分获得更好的查询结果.用 A 代替破折号:
Originally I hoped to keep the full ticket together. According to documentation digits with numbers should be kept together but after hammering away a this problem for some time and looking at the code I don't think that's the case. Solr always generates two terms. So rather than large numbers of matches for the first digit of n- I'm thinking I can get better query results from just the second portion. Substituting an A for a dash:
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="d[A](ddddddd)" replacement="$1" replace="all"
maxBlockChars="20000"/>
将解析 1A1234567 很好但-" 替换="$1" 替换="全部"maxBlockChars="20000"/>
will parse 1A1234567 fineBut -" replacement="$1" replace="all" maxBlockChars="20000"/>
不会解析 1-1234567
will not parse 1-1234567
所以看起来只是连字符的问题.我试过 -(escaped) 和 [-] 和 u002D 和 x{45} 和 x045 没有成功.
So it looks like just a problem with the hyphen. I've tried -(escaped) and [-] and u002D and x{45} and x045 without success.
我试过在它周围放置字符过滤器:
I've tried putting char filters around it:
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping.txt"/>
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="d[-](ddddddd)" replacement="$1" replace="all" maxBlockChars="20000"/>
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping2.txt"/>
带有映射:
"-" => "z"
然后
"z" => "-"
我看起来连字符在 Flex 标记化中被吃掉了,甚至无法用于字符过滤器.
I looks like the hyphen is eaten up in the Flex tokenization and isn't even available to the char filter.
有没有人在 Solr/Lucene 中使用连字符/破折号更成功?谢谢
Has anyone had more success with hyphen/dash in Solr/Lucene? Thanks
推荐答案
如果您的 Solr 使用的是最新的 Lucene(我认为是 3.x+),您将需要使用 ClassicAnalyzer 而不是 StandardAnalyzer,因为 StandardAnalyzer 现在总是处理连字符作为分隔符.
If your Solr is using a recent Lucene (3.x+ I think), you will want to use a ClassicAnalyzer rather than a StandardAnalyzer, as StandardAnalyzer now always treats hyphens as a delimiter.
这篇关于Solr Lucene 中的连字符/破折号挑战的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!