我每小时有一个Elasticsearch备份,经过大约370次备份(大约15天),我的备份存储库超过15G!但是总索引大小仅为500M左右! Elasticsearch是增量备份,但是15G VS 500M,差异是如此之大!我想知道索引和备份存储库之间的大小差异如此大是否正常?
是我经常备份(每小时一次)引起的吗?我在群集1中使用每小时备份,在群集2中使用每小时还原,以使两个ES群集数据保持实时相同。
我正在使用Elasticsearch备份API进行备份
curl -XPUT“http://IP:9200/_snapshot/backup” -d“{\” type \“:\” fs \“,\” settings \“:{\” compress \“:true,\” location \“:\” backup \“}} “
CURTIME =
date +"%Y-%m-%d %H:%M:%S"
BKTIME = $ {CURTIME // [-:] /}curl -XPUT“http://IP:9200/snapshot/backup/snapshot $ BKTIME?wait_for_completion = true”
我的Elasticsearch设置:2个节点,12个分片/节点,2个索引,fs备份类型以将快照存储到NAS
在Elasticsearch数据目录中,索引大小为:
node1索引大小:
[root @ esnode1索引] $ du -sh
307M。
node2索引大小
[root @ esnode2索引] $ du -sh
238M。
[root @ esnode1索引] $ du -lh
8.0K ./index1/10/translog
8.0K ./index1/10/_state
290万./index1/10/index
290万./index1/10
12K ./index1/5/translog
8.0K ./index1/5/_state
150万./index1/5/index
150万./index1/5
8.0K ./index1/4/translog
8.0K ./index1/4/_state
2.9M ./index1/4/index
290万./index1/4
8.0K ./index1/_state
8.0K ./index1/7/translog
8.0K ./index1/7/_state
2.9M ./index1/7/index
290万./index1/7
8.0K ./index1/1/translog
8.0K ./index1/1/_state
290万./index1/1/index
290万./index1/1
8.0K ./index1/2/translog
8.0K ./index1/2/_state
2.9M ./index1/2/index
290万./index1/2
8.0K ./index1/6/translog
8.0K ./index1/6/_state
3.0M ./index1/6/index
300万./index1/6
8.0K ./index1/0/translog
8.0K ./index1/0/_state
150万./index1/0/index
150万./index1/0
8.0K ./index1/8/translog
8.0K ./index1/8/_state
150万./index1/8/index
150万./index1/8
8.0K ./index1/11/translog
8.0K ./index1/11/_state
290万./index1/11/index
290万./index1/11
12K ./index1/9/translog
8.0K ./index1/9/_state
3.0M ./index1/9/index
3.0万./index1/9
8.0K ./index1/3/translog
8.0K ./index1/3/_state
3.0M ./index1/3/index
300万./index1/3
31M ./index1
16K ./index2/10/translog
8.0K ./index2/10/_state
16M ./index2/10/index
1600万./index2/10
36K ./index2/5/translog
8.0K ./index2/5/_state
43M ./index2/5/index
43M ./index2/5
20K ./index2/4/translog
8.0K ./index2/4/_state
17M ./index2/4/index
18M ./index2/4
8.0K ./index2/_state
40K ./index2/7/translog
8.0K ./index2/7/_state
32M ./index2/7/index
32M ./index2/7
68K ./index2/1/translog
8.0K ./index2/1/_state
21M ./index2/1/index
21M ./index2/1
64K ./index2/2/translog
8.0K ./index2/2/_state
19M ./index2/2/index
19M ./index2/2
116K ./index2/6/translog
8.0K ./index2/6/_state
22M ./index2/6/index
22M ./index2/6
24K ./index2/0/translog
8.0K ./index2/0/_state
17M ./index2/0/index
17M ./index2/0
128K ./index2/8/translog
8.0K ./index2/8/_state
34M ./index2/8/index
34M ./index2/8
72K ./index2/11/translog
8.0K ./index2/11/_state
20M ./index2/11/index
20M ./index2/11
88K ./index2/9/translog
8.0K ./index2/9/_state
22M ./index2/9/index
22M ./index2/9
76K ./index2/3/translog
8.0K ./index2/3/_state
16M ./index2/3/index
16M ./index2/3
277M ./index2
307M。
在备份存储库中,大小:
[root @ esnode1备份] $ du -lh
114M ./backup/indices/index1/0
112M ./backup/indices/index1/5
114M ./backup/indices/index1/11
114M ./backup/indices/index1/10
111M ./backup/indices/index1/8
116M ./backup/indices/index1/4
120M ./backup/indices/index1/9
118M ./backup/indices/index1/3
114M ./backup/indices/index1/2
115M ./backup/indices/index1/7
115M ./backup/indices/index1/1
112M ./backup/indices/index1/6
1.4G ./backup/indices/index1
747M ./backup/indices/index2/0
1.6G ./backup/indices/index2/5
887M ./backup/indices/index2/11
743M ./backup/indices/index2/10
2.1G ./backup/indices/index2/8
801M ./backup/indices/index2/4
1.3G ./backup/indices/index2/9
878M ./backup/indices/index2/3
951M ./backup/indices/index2/2
1.2G ./backup/indices/index2/7
953M ./backup/indices/index2/1
943M ./backup/indices/index2/6
13G ./backup/indices/index2
15G ./backup/indices
15G ./备份
1.1M ./backuplogs
15G。
======
https://www.elastic.co/blog/introducing-snapshot-restore
备份和还原操作都是增量操作,这意味着仅将自上次快照以来已更改的文件复制到存储库中或还原到索引中。 增量快照允许根据需要频繁执行快照操作,而不会占用太多磁盘空间。 现在,用户可以在升级之前轻松创建快照,或者在集群中进行危险的更改,并在出现问题时快速回滚到以前的索引状态。快照/还原机制还可用于在不同地理区域中的“热”群集和远程“冷”备份群集之间同步数据,以实现快速灾难恢复。
综上所述,我的案子确实是一个问题,有人可以帮助我吗?提前致谢 !
最佳答案
在Elasticsearch官方论坛中确认
1)在我的情况下,索引和备份存储库的大小差异很大(500G VS 15G)是正常的结果
2)备份快照中的某些冗余数据是由Lucene的段合并引起的
来自Elasticsearch专家:如果您不断在集群中建立索引,段的合并将在后台连续发生,并且同一条记录随着时间的流逝最终会分成多个段,从而导致存储库的容量大大超过索引大小。
https://discuss.elastic.co/t/backup-repository-size-is-much-bigger-than-indices-size/47469/7