问题描述
我正在尝试找到与当前正在浏览的标签相关的标签。我们的索引中的每个文档都被标记。每个标签由两部分组成:ID和文本名称:
I'm attempting to find related tags to the one currently being viewed. Every document in our index is tagged. Each tag is formed of two parts - an ID and text name:
{
...
meta: {
...
tags: [
{
id: 123,
name: 'Biscuits'
},
{
id: 456,
name: 'Cakes'
},
{
id: 789,
name: 'Breads'
}
]
}
}
要获取相关标签只需查询文档并得到其标签的总和:
To fetch the related tags I am simply querying the documents and getting an aggregate of their tags:
{
"query": {
"bool": {
"must": [
{
"match": {
"item.meta.tags.id": "123"
}
},
{
...
}
]
}
},
"aggs": {
"baked_goods": {
"terms": {
"field": "item.meta.tags.id",
"min_doc_count": 2
}
}
}
}
这样做完美,我得到了我想要的结果。但是,我要求标签ID 和名称做任何有用的事情。我已经探索了如何完成这个,解决方案似乎是:
This works perfectly, I am getting the results I want. However, I require both the tag ID and name to do anything useful. I have explored how to accomplish this, the solutions seem to be:
- 在索引时组合字段
- 一个将字段拼凑起来的脚本
- 嵌套聚合
选项一和二是不可用的,所以我已经去了3,但它没有以预期的方式回应。给出以下查询(仍在搜索也标记为饼干的文档):
Option one and two are are not available to me so I have been going with 3 but it's not responding in an expected manner. Given the following query (still searching for documents also tagged with 'Biscuits'):
{
...
"aggs": {
"baked_goods": {
"terms": {
"field": "item.meta.tags.id",
"min_doc_count": 2
},
"aggs": {
"name": {
"terms": {
"field": "item.meta.tags.name"
}
}
}
}
}
}
我会得到这个结果:
{
...
"aggregations": {
"baked_goods": {
"buckets": [
{
"key": "456",
"doc_count": 11,
"name": {
"buckets": [
{
"key": "Biscuits",
"doc_count": 11
},
{
"key": "Cakes",
"doc_count": 11
}
]
}
}
]
}
}
}
嵌套聚合包括我之前的标签(和按字母顺序返回)的搜索字词和。
The nested aggregation includes both the search term and the tag I'm after (returned in alphabetical order).
我已经尝试通过在嵌套聚合中添加一个 exclude
来减轻这一点,但是这个查询减慢了太多(周围100000次,共500000次)。到目前为止,最快的解决方案是手动去除结果。
I have tried to mitigate this by adding an exclude
to the nested aggregation but this slowed the query down far too much (around 100 times for 500000 docs). So far the fastest solution is to de-dupe the result manually.
在标签ID和标签名称的标签集合中,最好的方法是回复?
What is the best way to get an aggregation of tags with both the tag ID and tag name in the response?
感谢您这么做!
推荐答案
它的外观,您的标签
不是嵌套
。
要使此聚合工作,,以便 id
和名称
。没有嵌套
id
的列表只是一个数组,名称
s是另一个数组:
By the looks of it, your tags
is not nested
.For this aggregation to work, you need it nested
so that there is an association between an id
and a name
. Without nested
the list of id
s is just an array and the list of name
s is another array:
"item": {
"properties": {
"meta": {
"properties": {
"tags": {
"type": "nested", <-- nested field
"include_in_parent": true, <-- to, also, keep the flat array-like structure
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "string"
}
}
}
}
}
}
}
另外,请注意,我已经添加到映射这一行include_in_parent:true
这意味着您的嵌套
标签也将像平面结构。
Also, note that I've added to the mapping this line "include_in_parent": true
which means that your nested
tags will, also, behave like a "flat" array-like structure.
所以,你在查询中到目前为止的所有内容仍然可以在没有任何更改查询的情况下工作。
So, everything you had so far in your queries will still work without any changes to the queries.
但是,对于这个特定的查询聚合需要改变为这样的一种:
But, for this particular query of yours, the aggregation needs to change to something like this:
{
"aggs": {
"baked_goods": {
"nested": {
"path": "item.meta.tags"
},
"aggs": {
"name": {
"terms": {
"field": "item.meta.tags.id"
},
"aggs": {
"name": {
"terms": {
"field": "item.meta.tags.name"
}
}
}
}
}
}
}
}
结果是这样的:
"aggregations": {
"baked_goods": {
"doc_count": 9,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 123,
"doc_count": 3,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "biscuits",
"doc_count": 3
}
]
}
},
{
"key": 456,
"doc_count": 2,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "cakes",
"doc_count": 2
}
]
}
},
.....
这篇关于如何获得具有多个字段的弹性搜索聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!