只匹配每一个位置一次

只匹配每一个位置一次

本文介绍了弹性搜索:只匹配每一个位置一次的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



在我的Elasticsearch索引中,我有多个令牌位于同一位置。我想在每次匹配至少一个令牌时获取文档位置。
令牌的顺序并不重要。
我该怎么做?我使用Elasticsearch 0.90.5。



示例:



像这样。

  {
field:red car
}

我使用同义词令牌过滤器,在与原始令牌相同的位置添加同义词。
现在在现场,有2个职位:




  • 职位1:红色

  • 位置2:汽车,汽车



我现在的解决方案: / p>

为了确保所有职位相匹配,我也为最大职位索引。

  {
field:red car,
max_position:2
}

我有一个从DefaultSimilarity扩展的自定义相似性,并返回1 tf(),idf()和lengthNorm()。所得分数是该字段中匹配项的数量。



查询:

  {
custom_score:{
query:{
match:{
field:汽车是汽车
}
},
_script:_score * 100 / doc [\max_position\] + _ score
},
min_score:100
}

我的解决方案出现问题:



上述搜索不符合文档,因为查询字符串中没有令牌红色。但是它匹配,因为Elasticsearch将汽车和汽车的比赛计算为两场比赛,并且得到2分,导致102分的脚本分数,满足min_score。

解决方案

如果您需要保证100%的匹配与查询条款,您可以使用 minimum_should_match 。这是更常见的情况。






不幸的是,在您的情况下,您希望提供100% >索引条款。为了做到这一点,你必须下拉到Lucene级别并编写一个自定义(java - )相似性类,因为您需要访问未暴露给Query DSL的低级索引信息:



Per在查询记分员中扫描的文档/字段:




  • 匹配的条件数量(重叠是Lucene术语,它使用DefaultSimilarity类的coord()方法)

  • 字段中分析的总分数的数量:看这个线程有几种不同的方式获取此信息:



然后你的自定义相似性甚至扩展DefaultSimilarity)将需要检测条件匹配< 总计,并将其分数乘以零。



由于在此评分水平已经发生了查询和索引时间分析,因此总数索引条款已经被扩展到包括同义词,查询条款应该如何避免上述汽车是汽车的假阳性。


In my Elasticsearch index I have documents that have multiple tokens at the same position.

I want to get a document back when I match at least one token at every position.The order of the tokens is not important.How can I accomplish that? I use Elasticsearch 0.90.5.

Example:

I index a document like this.

{
    "field":"red car"
}

I use a synonym token filter that adds synonyms at the same positions as the original token.So now in the field, there are 2 positions:

  • Position 1: "red"
  • Position 2: "car", "automobile"

My solution for now:

To be able to ensure that all positions match, I index the maximum position as well.

{
    "field":"red car",
    "max_position": 2
}

I have a custom similarity that extends from DefaultSimilarity and returns 1 tf(), idf() and lengthNorm(). The resulting score is the number of matching terms in the field.

Query:

{
    "custom_score": {
        "query": {
             "match": {
                 "field": "a car is an automobile"
             }
        },
        "_script": "_score*100/doc[\"max_position\"]+_score"
    },
    "min_score":"100"
}

Problem with my solution:

The above search should not match the document, because there is no token "red" in the query string. But it matches, because Elasticsearch counts the matches for car and automobile as two matches and that gives a score of 2 which leads to a script score of 102, which satisfies the "min_score".

解决方案

If you needed to guarantee 100% matches against the query terms you could use minimum_should_match. This is the more common case.


Unfortunately, in your case, you wish to provide 100% matches of the indexed terms. To do this, you'll have to drop down to the Lucene level and write a custom (java - here's boilerplate you can fork) Similarity class, because you need access to low-level index information that is not exposed to the Query DSL:

Per document/field scanned in the query scorer:

Then your custom similarity (you can probably even extend DefaultSimilarity) will need to detect queries where terms matched < total terms and multiply their score by zero.

Since query and index-time analysis have already happened at this level of scoring, the total number of indexed terms will already be expanded to include synonyms, as should the query terms, avoiding the false-positive "a car is an automobile" issue above.

这篇关于弹性搜索:只匹配每一个位置一次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-15 18:16