我有一个 Elasticsearch 索引,其中有一个用于精确匹配的字段,以某种方式,我得到了很多相似的结果(我不介意),并且那些相似的结果在精确匹配之前被排序了(我确实介意)。

有人可以解释发生了什么以及如何解决吗?

我的映射是这样的

"exact":{
  "type":"string",
  "boost":10.0,
  "analyzer":"keyword"
},

我搜索“AAPL P JAN 2014 885,00”的查询是这样的:
{
  "size" : 21,
  "query" : {
    "field" : {
      "exact" : "AAPL P JAN 2014 885,00"
    }
  },
  "explain" : true,
  "sort" : [ {
    "_score" : {
      "order" : "desc"
    }
  } ],
  "facets" : {
    "category" : {
      "terms" : {
        "field" : "category",
        "size" : 10
      }
    }
  }
}

返回的文档按此顺序结束:
  • {“exact”:[“APPLE INC”,“US0378331005”,“AAPL”,“73773”],“id-compound”:“AAPL”}
  • {“exact”:[“AAPL”,“73773”,“AAPL P JAN 2014 675,00”],“id-compound”:“AAPL * PUT * 20140118 * 675”}
  • {“exact”:[“AAPL”,“73773”,“AAPL C JAN 2014 500,00”],“id-compound”:“AAPL * CALL * 20140118 * 500”}

  • 等等,并精确匹配一堆结果。

    有人可以向我解释为什么精确匹配没有排在最前面吗?

    下面的搜索结果有完整的解释,如果它有助于您理解事物。
    "hits" : [ {
      "_shard" : 0,
      "_node" : "1",
      "_index" : "instruments",
      "_type" : "instrument",
      "_id" : "AAPL",
      "_score" : 1306.8339, "_source" : {"exact":["APPLE INC","US0378331005","AAPL","73773"],"id-compound":"AAPL"},
      "_explanation" : {
        "value" : 1306.8339,
        "description" : "product of:",
        "details" : [ {
          "value" : 6534.169,
          "description" : "sum of:",
          "details" : [ {
            "value" : 6534.169,
            "description" : "weight(exact:AAPL in 9096), product of:",
            "details" : [ {
              "value" : 0.25854474,
              "description" : "queryWeight(exact:AAPL), product of:",
              "details" : [ {
                "value" : 6.1701355,
                "description" : "idf(docFreq=211, maxDocs=37299)"
              }, {
                "value" : 0.0419026,
                "description" : "queryNorm"
              } ]
            }, {
              "value" : 25272.875,
              "description" : "fieldWeight(exact:AAPL in 9096), product of:",
              "details" : [ {
                "value" : 1.0,
                "description" : "tf(termFreq(exact:AAPL)=1)"
              }, {
                "value" : 6.1701355,
                "description" : "idf(docFreq=211, maxDocs=37299)"
              }, {
                "value" : 4096.0,
                "description" : "fieldNorm(field=exact, doc=9096)"
              } ]
            } ]
          } ]
        }, {
          "value" : 0.2,
          "description" : "coord(1/5)"
        } ]
      }
    }, {
      "_shard" : 0,
      "_node" : "1",
      "_index" : "instruments",
      "_type" : "instrument",
      "_id" : "AAPL*PUT*20140118*675",
      "_score" : 163.35423, "_source" : {"exact":["AAPL","73773","AAPL P JAN 2014 675,00"],"id-compound":"AAPL*PUT*20140118*675"},
      "_explanation" : {
        "value" : 163.35423,
        "description" : "product of:",
        "details" : [ {
          "value" : 816.7711,
          "description" : "sum of:",
          "details" : [ {
            "value" : 816.7711,
            "description" : "weight(exact:AAPL in 18), product of:",
            "details" : [ {
              "value" : 0.25854474,
              "description" : "queryWeight(exact:AAPL), product of:",
              "details" : [ {
                "value" : 6.1701355,
                "description" : "idf(docFreq=211, maxDocs=37299)"
              }, {
                "value" : 0.0419026,
                "description" : "queryNorm"
              } ]
            }, {
              "value" : 3159.1094,
              "description" : "fieldWeight(exact:AAPL in 18), product of:",
              "details" : [ {
                "value" : 1.0,
                "description" : "tf(termFreq(exact:AAPL)=1)"
              }, {
                "value" : 6.1701355,
                "description" : "idf(docFreq=211, maxDocs=37299)"
              }, {
                "value" : 512.0,
                "description" : "fieldNorm(field=exact, doc=18)"
              } ]
            } ]
          } ]
        }, {
          "value" : 0.2,
          "description" : "coord(1/5)"
        } ]
      }
    }, {
      "_shard" : 0,
      "_node" : "1",
      "_index" : "instruments",
      "_type" : "instrument",
      "_id" : "AAPL*CALL*20140118*500",
      "_score" : 163.35423, "_source" : {"exact":["AAPL","73773","AAPL C JAN 2014 500,00"],"id-compound":"AAPL*CALL*20140118*500"},
      "_explanation" : {
        "value" : 163.35423,
        "description" : "product of:",
        "details" : [ {
          "value" : 816.7711,
          "description" : "sum of:",
          "details" : [ {
            "value" : 816.7711,
            "description" : "weight(exact:AAPL in 383), product of:",
            "details" : [ {
              "value" : 0.25854474,
              "description" : "queryWeight(exact:AAPL), product of:",
              "details" : [ {
                "value" : 6.1701355,
                "description" : "idf(docFreq=211, maxDocs=37299)"
              }, {
                "value" : 0.0419026,
                "description" : "queryNorm"
              } ]
            }, {
              "value" : 3159.1094,
              "description" : "fieldWeight(exact:AAPL in 383), product of:",
              "details" : [ {
                "value" : 1.0,
                "description" : "tf(termFreq(exact:AAPL)=1)"
              }, {
                "value" : 6.1701355,
                "description" : "idf(docFreq=211, maxDocs=37299)"
              }, {
                "value" : 512.0,
                "description" : "fieldNorm(field=exact, doc=383)"
              } ]
            } ]
          } ]
        }, {
          "value" : 0.2,
          "description" : "coord(1/5)"
        } ]
      }
    }, {
      "_id" : "AAPL*PUT*20140118*940",
      "_score" : 163.35423, "_source" : {"exact":["AAPL","73773","AAPL P JAN 2014 940,00"],"id-compound":"AAPL*PUT*20140118*940"},
      "_explanation" : {
        "value" : 163.35423,
        "description" : "product of:",
        "details" : [ {
          "value" : 816.7711,
          "description" : "sum of:",
          "details" : [ {
            "value" : 816.7711,
            "description" : "weight(exact:AAPL in 794), product of:",
            "details" : [ {
              "value" : 0.25854474,
              "description" : "queryWeight(exact:AAPL), product of:",
              "details" : [ {
                "value" : 6.1701355,
                "description" : "idf(docFreq=211, maxDocs=37299)"
              }, {
                "value" : 0.0419026,
                "description" : "queryNorm"
              } ]
            }, {
              "value" : 3159.1094,
              "description" : "fieldWeight(exact:AAPL in 794), product of:",
              "details" : [ {
                "value" : 1.0,
                "description" : "tf(termFreq(exact:AAPL)=1)"
              }, {
                "value" : 6.1701355,
                "description" : "idf(docFreq=211, maxDocs=37299)"
              }, {
                "value" : 512.0,
                "description" : "fieldNorm(field=exact, doc=794)"
              } ]
            } ]
          } ]
        }, {
          "value" : 0.2,
          "description" : "coord(1/5)"
        } ]
      }
    }
    

    万一我分析要存储的数据会发生什么情况:
    curl -XGET 'localhost:9200/instruments/_analyze?field=exact&pretty=true' -d 'ING  P JUN 2013 6.00'
    {
      "tokens" : [ {
        "token" : "ING  P JUN 2013 6.00",
        "start_offset" : 0,
        "end_offset" : 20,
        "type" : "word",
        "position" : 1
      } ]
    

    最佳答案

    正如您从说明输出中看到的那样,所有这三个文档的得分均完全相同,它们均在“AAPL”上匹配。该术语在文档(tf = 1)中始终出现一次,并且在37299文档(idf = 6.1701355)中的211中出现。由于您使用索引时间增强(映射中的增强部分10),因此字段规范要高得多,因为匹配始终位于同一字段上,所以没什么大不了的。仅仅是如果您在其他领域有一场比赛,确切的情况下总会赢,这在您的情况下可能是有道理的。

    但是问题是,如果我查看您的文档,AAPL P JAN 2014 885,00并不完全匹配。我看到的是查询中的5个词中只有一个匹配,这在您的说明输出中也由coord确认:coord(1/5)`。

    似乎应用了keyword分析器,但是从返回的文档中可以看到,您不是将exact字段的内容作为单个值发送,而是作为值的数组发送。由于您使用的是keyword分析器,因此不会对每个项目进行标记,但仍然有多个标记。我猜您必须检查如何索引文档。

    10-08 12:55