elasticsearch - Elasticsearch给出重复的结果

我有以下搜索/城市索引，其中元素将具有名称和许多其他属性。我执行以下聚合搜索:

{
"size": 0,
"query": {
    "multi_match" : {
        "query": "ana",
        "fields": [ "cityName" ],
        "type" : "phrase_prefix"
    }
},
"aggs": {
    "res": {
        "terms": {
            "field": "cityName"
        },
        "aggs":{
            "dedup_docs":{
                "top_hits":{
                    "size":1
                }
            }
        }
    }
}
}

结果，我得到了三个带有“Anahiem”，“ana”和“santa”键的存储桶。结果如下:

"buckets": [
    {
      "key": "anaheim",
      "doc_count": 11,
      "dedup_docs": {
        "hits": {
          "total": 11,
          "max_score": 5.8941016,
          "hits": [
            {
              "_index": "search",
              "_type": "City",
              "_id": "310",
              "_score": 5.8941016,
              "_source": {
                "id": 310,
                "country": "USA",
                "stateCode": "CA",
                "stateName": "California",
                "cityName": "Anaheim",
                "postalCode": "92806",
                "latitude": 33.822738,
                "longitude": -117.881633
              }
            }
          ]
        }
      }
    },
    {
      "key": "ana",
      "doc_count": 4,
      "dedup_docs": {
        "hits": {
          "total": 4,
          "max_score": 2.933612,
          "hits": [
            {
              "_index": "search",
              "_type": "City",
              "_id": "154",
              "_score": 2.933612,
              "_source": {
                "id": 154,
                "country": "USA",
                "stateCode": "CA",
                "stateName": "California",
                "cityName": "Santa Ana",
                "postalCode": "92706",
                "latitude": 33.767371,
                "longitude": -117.868255
              }
            }
          ]
        }
      }
    },
    {
      "key": "santa",
      "doc_count": 4,
      "dedup_docs": {
        "hits": {
          "total": 4,
          "max_score": 2.933612,
          "hits": [
            {
              "_index": "search",
              "_type": "City",
              "_id": "154",
              "_score": 2.933612,
              "_source": {
                "id": 154,
                "country": "USA",
                "stateCode": "CA",
                "stateName": "California",
                "cityName": "Santa Ana",
                "postalCode": "92706",
                "latitude": 33.767371,
                "longitude": -117.868255
              }
            }
          ]
        }
      }
    }
]

问题是为什么即使在我搜索“ana”的情况下，最后一个存储桶也具有键“santa”，为什么同一城市“santa ana”(id = 154)出现在2个不同的存储桶中(键“ana”和键“santa”)？

最佳答案

这主要是因为对cityName字段进行了分析，因此，在为Santa Ana编制索引时，将生成两个 token santa和ana并将其用于存储区。

如果要防止这种情况，则需要像这样定义cityName字段:

PUT search
{
    "mappings": {
        "City": {
            "properties": {
                "cityName": {
                    "type": "string",
                    "index": "not_analyzed"
                }
            }
        }
    }
}

您首先需要擦除索引，使用上面的映射重新创建索引，然后为数据重新索引。只有这样，您的存储桶名称才会为Anaheim和Santa Ana。

更新

如果您希望分析cityName但也只能在聚合中获得一个存储桶，则可以通过定义multi-field来实现，在该方法中，一部分被分析，而另一部分未被分析，像这样

PUT search
{
    "mappings": {
        "City": {
            "properties": {
                "cityName": {
                    "type": "string",
                    "fields": {
                        "raw": {
                            "type": "string",
                            "index": "not_analyzed"
                        }
                    }
                }
            }
        }
    }
}

因此，您让cityName进行了分析，但是现在您还拥有未分析的cityName.raw，可以像这样在聚合中使用:

    "terms": {
        "field": "cityName.raw"
    },

关于elasticsearch - Elasticsearch给出重复的结果，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/37103811/