我在ElasticSearch索引中插入了3条记录,如下所示:

curl -XPOST 'http://127.0.0.1:9200/geoindex_test/STREET?pretty=1'  -d '
{ "cityNames" : [ { "language" : "ENG",
    "name" : "w bridgewater",
    "raw_name" : "W BRIDGEWATER"
  },
  { "language" : "ENG",
    "name" : "west bridgewater",
    "raw_name" : "West Bridgewater"
  }
],
"id" : 1,
  "streetNames" : [ { "language" : "ENG",
    "name" : "cram rd",
    "raw_name" : "Cram Rd"
  } ]
}'

curl -XPOST 'http://127.0.0.1:9200/geoindex_test/STREET?pretty=1'  -d '
{ "cityNames" : [ { "language" : "ENG",
    "name" : "bridgewater corners",
    "raw_name" : "BRIDGEWATER CORNERS"
  },
  { "language" : "ENG",
    "name" : "bridgewater center",
    "raw_name" : "Bridgewater Center"
  }
],
"id" : 2,
"streetNames" : [ { "language" : "ENG",
    "name" : "valley view rd",
    "raw_name" : "Valley View Rd"
  } ]
}'

curl -XPOST 'http://127.0.0.1:9200/geoindex_test/STREET?pretty=1'  -d '
{ "cityNames" : [ { "language" : "ENG",
    "name" : "bridgewater",
    "raw_name" : "Bridgewater"
  },
  { "language" : "ENG",
    "name" : "windsor",
    "raw_name" : "Windsor"
  }
],
"id" : 3,
"streetNames" : [ { "language" : "ENG",
    "name" : "valley view rd",
    "raw_name" : "Valley View Rd"
  } ]
}'

我执行如下搜索:
curl -XGET 'http://127.0.0.1:9200/geoindex_test/STREET/_search?pretty=1'  -d '
{
"query" : {
    "match" : { "cityNames.name" : "bridgewater" }
}
}'

我以为ElasticSearch会将第三条记录(id == 3)作为最佳匹配(记录3是与“bridgewater”的唯一精确匹配),但是相反,它返回ID 1(w bridgewater)的记录为最佳匹配。我究竟做错了什么?

最佳答案

我想发生这种情况是因为您使用的是内部对象,这些对象基本上将其下的对象折叠为一个对象以进行搜索。因此,例如,在查询对象1的搜索字段时,您是在查询[“w bridgewater”,“west bridgewater”],而不是您可能想像的离散字段。

由于“bridgewater”在对象1和2(两个名称字段)中出现两次,而在对象3中出现一次,因此这些项在搜索中排名较高。最终选择了对象1,因为出现在“bridgewater”中的字段比对象2中的字段短(“w bridgewater”与“bridgewater corners”)。

与其使用您正在使用的内部对象,不如使用嵌套对象http://www.elasticsearch.org/guide/reference/mapping/nested-type/。将分数模式设置为“最大”将使您以更直观的方式进行匹配。

10-01 15:29