问题描述
在,我遇到了一个奇怪的情况。在我早期的映射中,根本没有使用doc值。在我的新映射中,除了分析字符串字段和布尔()。
所以在细节上,这是我如何进行:在重新索引所有数据之前,我重新启动了我的ES 1.7集群,并运行了一个带有排序,聚合和脚本字段的查询,以加热fielddata缓存。然后我查询了 / fielddata
端点,以了解fielddata缓存使用情况。它看起来像这样:
curl -XGET'localhost:9200 / _cat / fielddata?v& fields = *'
id主机ip节点总items.desc.raw more_fields ...
rKX7 ... myhost 192.168.1.100 Doom 32.9mb 2.3mb ...
如你所见,字段 items.desc.raw
使用2.3mb的堆空间。 项目
的类型为嵌套
,并包含一个字符串多字段,一个 not_analyzed
子字段称为 raw
。简而言之,该嵌套字段的映射如下所示:
items:{
type: nested,
properties:{
desc:{
type:string,
fields:{
raw :{
type:string,
index:not_analyzed
}
}
}
}
}
添加 doc_values:true
至 items.desc.raw
,重新索引整个索引并运行一些聚合,再次排序和编写脚本以加快fieldData缓存,我查询了 / fielddata
端点,结果如下:
curl -XGET'localhost:9200 / _cat / fielddata?v& ; fields = *'
id主机ip节点总items.desc.raw some_bools ...
tAB5 ... myhost 192.168.1.100 Yack 2.1mb 9.2kb ...
所以现场数据的使用确实已经大大降低(这是很好的),我看到的唯一的领域是布尔字段 some_bools
以上),但令人惊讶的是,我的嵌套 not_analyzed
字符串字段也出现了,但是有一个很多较低的空间使用率。
可能是因为 items.desc.raw
仍然出现在fielddata缓存中的原因?
不知何故,我忘记了。即使在使用 doc_values
之后,我仍然得到fielddata的用法,因为全局序号不能包含在 doc_values
请参阅
During some experiment with fielddata vs doc_values, I encountered a weird case. In my earlier mapping, I didn't use doc values at all. In my new mapping, I've added doc_values: true
to all fields in my mapping, except analyzed string fields and booleans (not supported until 2.0).
So in details, here is how I proceeded:
Before reindexing all my data, I restarted my ES 1.7 cluster fresh and ran a query with sorting, aggregations and script fields to "warm up" the fielddata cache. Then I queried the /fielddata
endpoint to have an idea of the fielddata cache usage. It looked something like this:
curl -XGET 'localhost:9200/_cat/fielddata?v&fields=*'
id host ip node total items.desc.raw more_fields...
rKX7... myhost 192.168.1.100 Doom 32.9mb 2.3mb ...
As you can see, the field items.desc.raw
used 2.3mb of heap space. items
is of type nested
and contains a string multi-field with a not_analyzed
sub-field called raw
. In short, the mapping of that nested field looks like this:
"items": {
"type": "nested",
"properties": {
"desc": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
After adding doc_values: true
to items.desc.raw
, reindexing the whole index and running some aggregations, sorting and scripting again to warm up the fielddata cache, I queried the /fielddata
endpoint again and here was the result:
curl -XGET 'localhost:9200/_cat/fielddata?v&fields=*'
id host ip node total items.desc.raw some_bools...
tAB5... myhost 192.168.1.100 Yack 2.1mb 9.2kb ...
So the fielddata usage has indeed been drastically lowered (which is good), the only fields I see are boolean fields (i.e. some_bools
above) which was expected, but to my surprise, my nested not_analyzed
string field also appeared, but with a much lower space usage.
What could be the cause of items.desc.raw
still appearing in the fielddata cache?
Somehow I forgot about global ordinals. They are the reason why I'm still getting fielddata usage even after using doc_values
as global ordinals cannot be included in doc_values
.
这篇关于not_analyzed字段与doc_values仍然在fielddata缓存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!