本文介绍了ElasticSearch:部分/精确的得分与edge_ngram&模糊的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

限时删除!!

在ElasticSearch中,我正在尝试使用具有模糊性的edge_ngram获得正确的得分。我想要完全匹配得分最高,次比赛得分较低。以下是我的设置和评分结果。

In ElasticSearch I am trying to get correct scoring using edge_ngram with fuzziness. I would like exact matches to have the highest score and sub matches have lesser scores. Below is my setup and scoring results.

settings: {
          number_of_shards: 1,
          analysis: {
             filter: {
                ngram_filter: {
                   type: 'edge_ngram',
                   min_gram: 2,
                   max_gram: 20
                }
             },
             analyzer: {
                ngram_analyzer: {
                   type: 'custom',
                   tokenizer: 'standard',
                   filter: [
                      'lowercase',
                      'ngram_filter'
                   ]
                }
             }
          }
       },
    mappings: [{
          name: 'voter',
          _all: {
                'type': 'string',
                'index_analyzer': 'ngram_analyzer',
                'search_analyzer': 'standard'
             },
             properties: {
                last: {
                   type: 'string',
                   required : true,
                   include_in_all: true,
                   term_vector: 'yes',
                   index_analyzer: 'ngram_analyzer',
                   search_analyzer: 'standard'
                },
                first: {
                   type: 'string',
                   required : true,
                   include_in_all: true,
                   term_vector: 'yes',
                   index_analyzer: 'ngram_analyzer',
                   search_analyzer: 'standard'
                },

             }

       }]

在做了一个名为Michael的POST后,我做一个查询如下Michael,Michae,Micha,Mich,Mic和Mi。

After doing a POST with first name "Michael" I do a query as below with changes "Michael", "Michae", "Micha", "Mich", "Mic", and "Mi".

GET voter/voter/_search
{
 "query": {
    "match": {
      "_all": {
        "query": "Michael",
        "fuzziness": 2,
        "prefix_length": 1
      }
    }
  }
}

我的成绩是:

-"Michael": 0.19535106
-"Michae": 0.2242768
-"Micha": 0.24513611
-"Mich": 0.22340237
-"Mic": 0.21408978
-"Mi": 0.15438235

正如您所看到的那样,分数结果没有达到预期。我希望迈克尔拥有最高分,米最低

As you can see the score results aren't getting as expected. I would like "Michael" to have the highest score and "Mi" to have the lowest

任何帮助将不胜感激!

推荐答案

解决这个问题的一种方法是在你的映射中添加原始版本的文本,这样

One way to approach this problem would be to add raw version of text in your mapping like this

                   last: {
                       type: 'string',
                       required : true,
                       include_in_all: true,
                       term_vector: 'yes',
                       index_analyzer: 'ngram_analyzer',
                       search_analyzer: 'standard',
                       "fields": {
                            "raw": {
                               "type":  "string"  <--- index with standard analyzer
                              }
                          }
                    },
                    first: {
                       type: 'string',
                       required : true,
                       include_in_all: true,
                       term_vector: 'yes',
                       index_analyzer: 'ngram_analyzer',
                       search_analyzer: 'standard',
                       "fields": {
                            "raw": {
                               "type":  "string"  <--- index with standard analyzer
                              }
                          }
                    },

您还可以使用 确切索引:not_analyzed

You could also make it exact with index : not_analyzed

然后你可以这样查询

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "_all": {
              "query": "Michael",
              "fuzziness": 2,
              "prefix_length": 1
            }
          }
        },
        {
          "match": {
            "last.raw": {
              "query": "Michael",
              "boost": 5
            }
          }
        },
        {
          "match": {
            "first.raw": {
              "query": "Michael",
              "boost": 5
            }
          }
        }
      ]
    }
  }
}

匹配更多子句的文档将得分较高。
您可以根据您的要求指定 boost

Documents that matches more clauses will be scored higher.You could specify boost according to your requirements.

这篇关于ElasticSearch:部分/精确的得分与edge_ngram&amp;模糊的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

1403页,肝出来的..

09-09 00:24