发生了什么:
当我反复查询时,Elastic Search突然停止工作。
我将堆大小从1G​​B增加到2GB,然后又增加到4GB,但是没有用。
当前的堆使用率仅是分配的4Gb的20%,但是为什么ES会因OOM而失败?

ElasticSearch日志:

2019-11-11T11:12:16,654][INFO ][o.e.c.r.a.AllocationService] [es-stg] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[.kibana_1][0]] ...])
[2019-11-11T11:12:51,447][INFO ][o.e.c.m.MetaDataIndexTemplateService] [es-stg] adding template [kibana_index_template:.kibana] for index patterns [.kibana]
[2019-11-11T11:13:10,527][INFO ][o.e.m.j.JvmGcMonitorService] [es-stg] [gc][71] overhead, spent [418ms] collecting in the last [1s]
[2019-11-11T11:13:16,619][INFO ][o.e.m.j.JvmGcMonitorService] [es-stg] [gc][77] overhead, spent [313ms] collecting in the last [1s]
[2019-11-11T11:13:21,187][WARN ][o.e.m.j.JvmGcMonitorService] [es-stg] [gc][80] overhead, spent [2.4s] collecting in the last [2.5s]
[2019-11-11T11:13:25,396][WARN ][o.e.m.j.JvmGcMonitorService] [es-stg] [gc][83] overhead, spent [2s] collecting in the last [2.1s]
[2019-11-11T11:13:27,983][WARN ][o.e.m.j.JvmGcMonitorService] [es-stg] [gc][84] overhead, spent [2.3s] collecting in the last [2.6s]
[2019-11-11T11:13:30,029][WARN ][o.e.m.j.JvmGcMonitorService] [es-stg] [gc][85] overhead, spent [2s] collecting in the last [2s]
[2019-11-11T11:13:34,184][WARN ][o.e.m.j.JvmGcMonitorService] [es-stg] [gc][86] overhead, spent [4.1s] collecting in the last [4.1s]
[2019-11-11T11:14:31,155][WARN ][o.e.c.InternalClusterInfoService] [es-stg] Failed to update node information for ClusterInfoUpdateJob within 15s timeout
[2019-11-11T11:14:31,172][WARN ][o.e.m.j.JvmGcMonitorService] [es-stg] [gc][87] overhead, spent [18.2s] collecting in the last [18.3s]
[2019-11-11T11:14:31,215][ERROR][o.e.x.m.c.i.IndexStatsCollector] [es-stg] collector [index-stats] timed out when collecting data
[2019-11-11T11:14:31,210][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [es-stg] fatal error in thread [elasticsearch[es-stg][search][T#6]], exiting
java.lang.OutOfMemoryError: Java heap space

规格:
Ubuntu的:16.04
记忆体:8Gb
JVM内存:4Gb

的结果:
http://localhost:9200/_cat/allocation
    84 4.6gb 55.1gb 22.2gb 77.3gb 71 206.189.140.50 206.189.140.50 es-stg
    42                                                          UNASSIGNED
http://localhost:9200/_cat/fielddata?v
        id                host ip node   field                        size
    o_KWnYBuR-aimAl1VUtygA ip ip es-stg shard.node                  2.7kb
    o_KWnYBuR-aimAl1VUtygA ip ip es-stg transaction.type             704b
    o_KWnYBuR-aimAl1VUtygA ip ip es-stg transaction.name.keyword      1kb
    o_KWnYBuR-aimAl1VUtygA ip ip es-stg kibana_stats.kibana.status    2kb
    o_KWnYBuR-aimAl1VUtygA ip ip es-stg beat.hostname               5.8kb
    o_KWnYBuR-aimAl1VUtygA ip ip es-stg transaction.result           704b
    o_KWnYBuR-aimAl1VUtygA ip ip es-stg kibana_stats.kibana.uuid      2kb
    o_KWnYBuR-aimAl1VUtygA ip ip es-stg source_node.name            2.7kb
    o_KWnYBuR-aimAl1VUtygA ip ip es-stg shard.index                12.1kb
    o_KWnYBuR-aimAl1VUtygA ip ip es-stg shard.state                 6.6kb
    o_KWnYBuR-aimAl1VUtygA ip ip es-stg context.service.agent.name  2.2kb
    o_KWnYBuR-aimAl1VUtygA ip ip es-stg source_node.uuid            2.7kb

http://localhost:9200/_cat/nodes?v
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
ip       18          98      2    0.04    0.08     0.06    mdi       *    es-stg

http://localhost:9200/_cluster/settings
{persistent: {xpack: {monitoring: {collection: {enabled: "true"}}}},transient: { }}

预期:
需要 flex 搜索才能正常运行。
(较少的磁盘空间与此问题有关吗?)

最佳答案

分配

最好的经验法则是确保将每个节点的分片数量保持在已配置的每GB堆20到25个以下
示例:具有30GB堆的节点因此应具有最多600-750个分片
分片不得大于50GB。 25GB是我们针对大碎片的目标。
保持分片大小小于数据节点大小的40%。

分配节点

curl localhost:9200/_cat/allocation?v

https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster

将进程地址空间锁定到RAM中并避免交换

将此行添加到config / elasticsearch.yml
bootstrap.memory_lock: true

https://www.elastic.co/guide/en/elasticsearch/reference/current/_memory_lock_check.html

关于elasticsearch - Elastic Search的Java堆空间问题,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/58800603/

10-17 03:11