我有以下搜索/城市索引,其中元素将具有名称和许多其他属性。我执行以下聚合搜索:
{
"size": 0,
"query": {
"multi_match" : {
"query": "ana",
"fields": [ "cityName" ],
"type" : "phrase_prefix"
}
},
"aggs": {
"res": {
"terms": {
"field": "cityName"
},
"aggs":{
"dedup_docs":{
"top_hits":{
"size":1
}
}
}
}
}
}
结果,我得到了三个带有“Anahiem”,“ana”和“santa”键的存储桶。结果如下:
"buckets": [
{
"key": "anaheim",
"doc_count": 11,
"dedup_docs": {
"hits": {
"total": 11,
"max_score": 5.8941016,
"hits": [
{
"_index": "search",
"_type": "City",
"_id": "310",
"_score": 5.8941016,
"_source": {
"id": 310,
"country": "USA",
"stateCode": "CA",
"stateName": "California",
"cityName": "Anaheim",
"postalCode": "92806",
"latitude": 33.822738,
"longitude": -117.881633
}
}
]
}
}
},
{
"key": "ana",
"doc_count": 4,
"dedup_docs": {
"hits": {
"total": 4,
"max_score": 2.933612,
"hits": [
{
"_index": "search",
"_type": "City",
"_id": "154",
"_score": 2.933612,
"_source": {
"id": 154,
"country": "USA",
"stateCode": "CA",
"stateName": "California",
"cityName": "Santa Ana",
"postalCode": "92706",
"latitude": 33.767371,
"longitude": -117.868255
}
}
]
}
}
},
{
"key": "santa",
"doc_count": 4,
"dedup_docs": {
"hits": {
"total": 4,
"max_score": 2.933612,
"hits": [
{
"_index": "search",
"_type": "City",
"_id": "154",
"_score": 2.933612,
"_source": {
"id": 154,
"country": "USA",
"stateCode": "CA",
"stateName": "California",
"cityName": "Santa Ana",
"postalCode": "92706",
"latitude": 33.767371,
"longitude": -117.868255
}
}
]
}
}
}
]
问题是为什么即使在我搜索“ana”的情况下,最后一个存储桶也具有键“santa”,为什么同一城市“santa ana”(id = 154)出现在2个不同的存储桶中(键“ana”和键“santa”)?
最佳答案
这主要是因为对cityName
字段进行了分析,因此,在为Santa Ana
编制索引时,将生成两个 token santa
和ana
并将其用于存储区。
如果要防止这种情况,则需要像这样定义cityName
字段:
PUT search
{
"mappings": {
"City": {
"properties": {
"cityName": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
您首先需要擦除索引,使用上面的映射重新创建索引,然后为数据重新索引。只有这样,您的存储桶名称才会为
Anaheim
和Santa Ana
。更新
如果您希望分析
cityName
但也只能在聚合中获得一个存储桶,则可以通过定义multi-field来实现,在该方法中,一部分被分析,而另一部分未被分析,像这样PUT search
{
"mappings": {
"City": {
"properties": {
"cityName": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
因此,您让
cityName
进行了分析,但是现在您还拥有未分析的cityName.raw
,可以像这样在聚合中使用: "terms": {
"field": "cityName.raw"
},
关于elasticsearch - Elasticsearch给出重复的结果,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/37103811/