我每小时有一个Elasticsearch备份,经过大约370次备份(大约15天),我的备份存储库超过15G!但是总索引大小仅为500M左右! Elasticsearch是增量备份,但是15G VS 500M,差异是如此之大!我想知道索引和备份存储库之间的大小差异如此大是否正常?

是我经常备份(每小时一次)引起的吗?我在群集1中使用每小时备份,在群集2中使用每小时还原,以使两个ES群集数据保持实时相同。

我正在使用Elasticsearch备份API进行备份

  • 设置存储库:
    curl -XPUT“http://IP:9200/_snapshot/backup” -d“{\” type \“:\” fs \“,\” settings \“:{\” compress \“:true,\” location \“:\” backup \“}} “
  • 备份:
    CURTIME = date +"%Y-%m-%d %H:%M:%S"BKTIME = $ {CURTIME // [-:] /}
    curl -XPUT“http://IP:9200/snapshot/backup/snapshot $ BKTIME?wait_for_completion = true”

  • 我的Elasticsearch设置:2个节点,12个分片/节点,2个索引,fs备份类型以将快照存储到NAS

    在Elasticsearch数据目录中,索引大小为:

    node1索引大小:
    [root @ esnode1索引] $ du -sh

    307M。

    node2索引大小
    [root @ esnode2索引] $ du -sh

    238M。

    [root @ esnode1索引] $ du -lh

    8.0K ./index1/10/translog
    8.0K ./index1/10/_state
    290万./index1/10/index
    290万./index1/10
    12K ./index1/5/translog
    8.0K ./index1/5/_state
    150万./index1/5/index
    150万./index1/5
    8.0K ./index1/4/translog
    8.0K ./index1/4/_state
    2.9M ./index1/4/index
    290万./index1/4
    8.0K ./index1/_state
    8.0K ./index1/7/translog
    8.0K ./index1/7/_state
    2.9M ./index1/7/index
    290万./index1/7
    8.0K ./index1/1/translog
    8.0K ./index1/1/_state
    290万./index1/1/index
    290万./index1/1
    8.0K ./index1/2/translog
    8.0K ./index1/2/_state
    2.9M ./index1/2/index
    290万./index1/2
    8.0K ./index1/6/translog
    8.0K ./index1/6/_state
    3.0M ./index1/6/index
    300万./index1/6
    8.0K ./index1/0/translog
    8.0K ./index1/0/_state
    150万./index1/0/index
    150万./index1/0
    8.0K ./index1/8/translog
    8.0K ./index1/8/_state
    150万./index1/8/index
    150万./index1/8
    8.0K ./index1/11/translog
    8.0K ./index1/11/_state
    290万./index1/11/index
    290万./index1/11
    12K ./index1/9/translog
    8.0K ./index1/9/_state
    3.0M ./index1/9/index
    3.0万./index1/9
    8.0K ./index1/3/translog
    8.0K ./index1/3/_state
    3.0M ./index1/3/index
    300万./index1/3
    31M ./index1
    16K ./index2/10/translog
    8.0K ./index2/10/_state
    16M ./index2/10/index
    1600万./index2/10
    36K ./index2/5/translog
    8.0K ./index2/5/_state
    43M ./index2/5/index
    43M ./index2/5
    20K ./index2/4/translog
    8.0K ./index2/4/_state
    17M ./index2/4/index
    18M ./index2/4
    8.0K ./index2/_state
    40K ./index2/7/translog
    8.0K ./index2/7/_state
    32M ./index2/7/index
    32M ./index2/7
    68K ./index2/1/translog
    8.0K ./index2/1/_state
    21M ./index2/1/index
    21M ./index2/1
    64K ./index2/2/translog
    8.0K ./index2/2/_state
    19M ./index2/2/index
    19M ./index2/2
    116K ./index2/6/translog
    8.0K ./index2/6/_state
    22M ./index2/6/index
    22M ./index2/6
    24K ./index2/0/translog
    8.0K ./index2/0/_state
    17M ./index2/0/index
    17M ./index2/0
    128K ./index2/8/translog
    8.0K ./index2/8/_state
    34M ./index2/8/index
    34M ./index2/8
    72K ./index2/11/translog
    8.0K ./index2/11/_state
    20M ./index2/11/index
    20M ./index2/11
    88K ./index2/9/translog
    8.0K ./index2/9/_state
    22M ./index2/9/index
    22M ./index2/9
    76K ./index2/3/translog
    8.0K ./index2/3/_state
    16M ./index2/3/index
    16M ./index2/3
    277M ./index2
    307M。

    在备份存储库中,大小:
    [root @ esnode1备份] $ du -lh

    114M ./backup/indices/index1/0

    112M ./backup/indices/index1/5

    114M ./backup/indices/index1/11

    114M ./backup/indices/index1/10

    111M ./backup/indices/index1/8

    116M ./backup/indices/index1/4

    120M ./backup/indices/index1/9

    118M ./backup/indices/index1/3

    114M ./backup/indices/index1/2

    115M ./backup/indices/index1/7

    115M ./backup/indices/index1/1

    112M ./backup/indices/index1/6

    1.4G ./backup/indices/index1

    747M ./backup/indices/index2/0

    1.6G ./backup/indices/index2/5

    887M ./backup/indices/index2/11

    743M ./backup/indices/index2/10

    2.1G ./backup/indices/index2/8

    801M ./backup/indices/index2/4

    1.3G ./backup/indices/index2/9

    878M ./backup/indices/index2/3

    951M ./backup/indices/index2/2

    1.2G ./backup/indices/index2/7

    953M ./backup/indices/index2/1

    943M ./backup/indices/index2/6

    13G ./backup/indices/index2

    15G ./backup/indices

    15G ./备份

    1.1M ./backuplogs

    15G。

    ======
    https://www.elastic.co/blog/introducing-snapshot-restore
    备份和还原操作都是增量操作,这意味着仅将自上次快照以来已更改的文件复制到存储库中或还原到索引中。 增量快照允许根据需要频繁执行快照操作,而不会占用太多磁盘空间。 现在,用户可以在升级之前轻松创建快照,或者在集群中进行危险的更改,并在出现问题时快速回滚到以前的索引状态。快照/还原机制还可用于在不同地理区域中的“热”群集和远程“冷”备份群集之间同步数据,以实现快速灾难恢复。

    综上所述,我的案子确实是一个问题,有人可以帮助我吗?提前致谢 !

    最佳答案

    在Elasticsearch官方论坛中确认

    1)在我的情况下,索引和备份存储库的大小差异很大(500G VS 15G)是正常的结果

    2)备份快照中的某些冗余数据是由Lucene的段合并引起的

    来自Elasticsearch专家:如果您不断在集群中建立索引,段的合并将在后台连续发生,并且同一条记录随着时间​​的流逝最终会分成多个段,从而导致存储库的容量大大超过索引大小。

    https://discuss.elastic.co/t/backup-repository-size-is-much-bigger-than-indices-size/47469/7

    09-05 21:22