问题描述
我有一个索引映射,两个字符串字段 field1
和 field2
,都被声明为copy_to到另一个字段称为 all_fields
。 all_fields
被索引为not_analyzed。 当我在 all_fields上创建一个桶聚合时
,我期待着具有field1和field2的键的不同的桶连接在一起。相反,我得到了不相关的字段1和field2的关键字桶。
示例:
映射:
{
mappings:{
myobject:{
properties:{
field1:{
type:string,
index:analyze,
copy_to:all_fields
},
field2:{
type:string,
index:analyze,
copy_to:all_fields
},
all_fields:{
type:string,
index:not_analyzed
}
}
}
}
}
数据:
{
field1:晚餐胡萝卜马铃薯西兰花,
field2:这里的东西,
}
和
{
field1:鱼鸡肉g,
field2:晚餐,
}
:
{
aggs:{
t:{
:{
field:all_fields
}
}
}
}
结果:
...
聚合 {
t:{
doc_count_error_upper_bound:0,
sum_other_doc_count:0,
buckets:[
{
:晚餐,
doc_count:1
},
{
key:晚餐胡萝卜马铃薯西兰花,
doc_count:1
},
{
key:fish chicken something,
doc_count:1
},
{
关键:这里的东西,
doc_count:1
}
]
}
}
我期待只有2个桶, fish chicken somethingdinner
和晚餐胡萝卜马铃薯broccolisomethinghere
我做错了什么?
您要查找的是两个字符串的连接。 copy_to
即使它似乎是这样做,它不是。在 copy_to
中,您在概念上创建了一组来自 field1
和 field2
,不连接它们。
对于您的用例,您有两个选项:
- 使用
- 执行脚本聚合
我会推荐 _source
转换,因为我认为它比执行脚本更有效率。意思是,您在索引时支付一小笔费用,而不是做一个沉重的脚本集合。
对于 _source
转换:
PUT / lastseen
{
mappings:{
test:{
transform:{
script:ctx._source ['all_fields'] = ctx._source ['field1'] +''+ ctx._source [' field2']
},
properties:{
field1:{
type:string
},
field2:{
type:string
},
lastseen:{
type:long
},
all_fields:{
type:string,
index:not_analyzed
}
}
}
}
}
查询:
GET / lastseen / test / _search
pre>
{
aggs:{
NAME:{
terms:{
field:all_fields,
size:10
}
}
}
}
对于脚本聚合,更容易做(意思是使用
doc ['field']。
而不是更昂贵的_source.field
)将.raw
子字段添加到field1
和field2
:PUT / lastseen
pre>
{
mappings:{
test:{
properties:{
field1:{
type:string,
fields:{
raw:{
type:string,
index:not_analyzed
}
}
},
field2:{
type:string,
fields:{
raw:{
type:string,
index:not_analyzed
}
}
},
lastseen : {
type:long
}
}
}
}
}
脚本将使用这些
.raw
子字段:
$ $ $ $ $ $ $ $ $ $ $ $ $ $$ {
doc''+''+ doc ['field2.raw']。value,
size:10,
lang:groovy
}
}
}
}
没有code> .raw 子字段(以
not_analyzed
作为目的),您将需要执行此操作,这是更贵:{
aggs:{
NAME:{
terms:{
script:_source.field1 +''+ _source.field2,
size:10,
lang:groovy
}
}
}
}
I have an index mapping with two string fields,
field1
andfield2
, both being declared as copy_to to another field calledall_fields
.all_fields
is indexed as "not_analyzed".When I create a bucket aggregation on
all_fields
, I was expecting distinct buckets with keys of field1 and field2 concatenated together. Instead, I get separate buckets with keys of field1 and field2 unconcatenated.Example:mapping:
{ "mappings": { "myobject": { "properties": { "field1": { "type": "string", "index": "analyzed", "copy_to": "all_fields" }, "field2": { "type": "string", "index": "analyzed", "copy_to": "all_fields" }, "all_fields": { "type": "string", "index": "not_analyzed" } } } } }
data in:
{ "field1": "dinner carrot potato broccoli", "field2": "something here", }
and
{ "field1": "fish chicken something", "field2": "dinner", }
aggregation:
{ "aggs": { "t": { "terms": { "field": "all_fields" } } } }
results:
... "aggregations": { "t": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "dinner", "doc_count": 1 }, { "key": "dinner carrot potato broccoli", "doc_count": 1 }, { "key": "fish chicken something", "doc_count": 1 }, { "key": "something here", "doc_count": 1 } ] } }
I was expecting only 2 buckets,
fish chicken somethingdinner
anddinner carrot potato broccolisomethinghere
What am I doing wrong?
解决方案What you are looking for is concatenation of two strings.
copy_to
even if it seems is doing this, it is not. Withcopy_to
you are, conceptually, creating a set of values from bothfield1
andfield2
, not concatenating them.For your use case, you have two options:
- use
_source
transformation- perform a script aggregation
I would recommend
_source
transformation as I think it's more efficient than doing the scripting. Meaning, you pay a little price at indexing time than doing a heavy scripting aggregation.For
_source
transformation:PUT /lastseen { "mappings": { "test": { "transform": { "script": "ctx._source['all_fields'] = ctx._source['field1'] + ' ' + ctx._source['field2']" }, "properties": { "field1": { "type": "string" }, "field2": { "type": "string" }, "lastseen": { "type": "long" }, "all_fields": { "type": "string", "index": "not_analyzed" } } } } }
And the query:
GET /lastseen/test/_search { "aggs": { "NAME": { "terms": { "field": "all_fields", "size": 10 } } } }
For script aggregation, to be easier to do (meaning, using
doc['field'].value
rather than the more expensive_source.field
) add.raw
sub-fields tofield1
andfield2
:PUT /lastseen { "mappings": { "test": { "properties": { "field1": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "field2": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "lastseen": { "type": "long" } } } } }
And the script will use these
.raw
subfields:{ "aggs": { "NAME": { "terms": { "script": "doc['field1.raw'].value + ' ' + doc['field2.raw'].value", "size": 10, "lang": "groovy" } } } }
Without the
.raw
sub-fields (which are made on purpose asnot_analyzed
) you would have needed to do something like this, which is more expensive:{ "aggs": { "NAME": { "terms": { "script": "_source.field1 + ' ' + _source.field2", "size": 10, "lang": "groovy" } } } }
这篇关于elasticsearch copy_to字段的行为不如预期的聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!