I've got an unusual situation. Normally when you search a text index you are searching for a small number of keywords against documents with a larger number of terms.
例如,您可能搜索quick brown"并期望匹配the quick brown fox jumps over the lazy dog".
For example you might search for "quick brown" and expect to match "the quick brown fox jumps over the lazy dog".
I have the situation where I have lots of small phrases in my document store and I wish to match them against a larger query phrase.
For example if I have a query:
- 敏捷的棕狐跳过懒惰的狗"
- 快速棕色"
- 狐疑"
- 懒狗"
我想查找在查询中出现短语的文档.在这种情况下,quick brown"和lazy dog"(但不是fox over",因为尽管标记匹配,但它不是搜索字符串中的短语).
I'd like to find the documents that have a phrase that occurs in the query. In this case "quick brown" and "lazy dog" (but not "fox over" because although the tokens match it's not a phrase in the search string).
SOLR/lucene 可以进行这种查询吗?
Is this sort of query possible with SOLR/lucene?
听起来您想在分析中使用 ShingleFilter,以便索引单词二元组:所以在查询和索引时都添加 ShingleFilterFactory.
It sounds like you want to use ShingleFilter in your analysis, so that you index word bigrams: so add ShingleFilterFactory at both query and index time.
At index time your documents are then indexed as such:
- 快速棕色"-> quick_brown
- 狐狸"-> fox_over
- 懒狗"->lazy_dog
At query time your query becomes:
- "the quick brown fox jumps over the lazy dog" -> "the_quick quick_brown brown_fox fox_jumps jumps_over over_the_lazy lazy_dog"
这样还是不行,默认会形成词组查询.因此,在您的仅查询分析器中,在 ShingleFilterFactory 之后添加 PositionFilterFactory.这会展平"查询中的位置,以便查询解析器将输出视为同义词,这将产生一个带有这些子句的布尔查询(所有应该子句,所以它基本上是一个 OR 查询):
This is still no good, by default it will form a phrase query.So in your query analyzer only add PositionFilterFactory after the ShingleFilterFactory. This "flattens" the positions in the query so that the queryparser treats the output as synonyms, which will yield a booleanquery with these subs (all SHOULD clauses, so its basically an OR query):
- the_quick 或
- quick_brown 或
- brown_fox 或
- ...
this should be the most performant way, as then its really just a booleanquery of termqueries.
