PG数计算
原地址:http://xiaqunfeng.cc/2017/09/15/too-many-PGs-per-OSD/
ceph告警问题:”too many PGs per OSD” 的解决方法,以及pg数量的合理设定
现象
1 2 3 4 5 6 7 8 9 10 11 12 13 | # ceph -s cluster 4c7ec5af-cbd3-40fd-8c96-0615c77660d4 health HEALTH_WARN too many PGs per OSD (412 > max 300) monmap e2: 3 mons at {ceph0=172.21.1.21:6789/0,ceph1=172.21.1.22:6789/0,ceph2=172.21.1.23:6789/0} election epoch 1780, quorum 0,1,2 ceph0,ceph1,ceph2 mgr active: ceph0 standbys: ceph1, ceph2 osdmap e94: 6 osds: 6 up, 6 in flags sortbitwise,require_jewel_osds,require_kraken_osds pgmap v161317: 824 pgs, 10 pools, 30201 MB data, 8642 objects 90831 MB used, 181 GB / 269 GB avail 824 active+clean client io 34800 B/s wr, 0 op/s rd, 9 op/s wr |
原因
- 集群osd 数量较少
- 搭建rgw网关、OpenStack、容器组件等,pool创建较多,每个pool默认需要占用一些pg,pool中pg数目设置不合理,导致集群 total pg 数过多
解决
方法
调整每个osd默认pg数,参数为 mon_pg_warn_max_per_osd
,当前默认参数如下:
1 2 | # ceph --show-config | grep mon_pg_warn_max_per_osd mon_pg_warn_max_per_osd = 300 |
步骤
1、修改ceph配置文件
1 2 3 4 5 | # cd /etc/ceph # vim ceph.conf [global] ....... mon_pg_warn_max_per_osd = 500 |
2、将配置文件推到mon所在的其他节点
1 | # ceph-deploy --overwrite-conf config push ceph1 ceph2 |
3、重启mon进程
1 | # systemctl restart ceph-mon.target |
重启成功后,再次查看配置项
1 2 | # ceph --show-config | grep mon_pg_warn_max_per_osd mon_pg_warn_max_per_osd = 500 |
此时集群状态ok
1 2 3 4 5 6 7 8 9 10 11 12 | # ceph -s cluster 4c7ec5af-cbd3-40fd-8c96-0615c77660d4 health HEALTH_OK monmap e2: 3 mons at {ceph0=172.21.1.21:6789/0,ceph1=172.21.1.22:6789/0,ceph2=172.21.1.23:6789/0} election epoch 1780, quorum 0,1,2 ceph0,ceph1,ceph2 mgr active: ceph0 standbys: ceph1, ceph2 osdmap e94: 6 osds: 6 up, 6 in flags sortbitwise,require_jewel_osds,require_kraken_osds pgmap v161317: 824 pgs, 10 pools, 30201 MB data, 8642 objects 90831 MB used, 181 GB / 269 GB avail 824 active+clean client io 34800 B/s wr, 0 op/s rd, 9 op/s wr |
pg数目的设定
信息查询
1、查看当前osd数目
1 2 | # ceph osd ls | wc -l 6 |
2、查看当前有多少个pool
1 2 | # ceph osd pool ls | wc -l 10 |
3、查看 replication pool 的数量
1 2 3 4 5 6 7 8 9 10 11 | # ceph osd dump | grep repli pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 31 flags hashpspool stripe_width 0 pool 1 '.rgw.root' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 14 flags hashpspool stripe_width 0 pool 2 'default.rgw.control' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 16 flags hashpspool stripe_width 0 pool 3 'default.rgw.data.root' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 19 flags hashpspool stripe_width 0 pool 4 'default.rgw.gc' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 20 flags hashpspool stripe_width 0 pool 5 'default.rgw.lc' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 21 flags hashpspool stripe_width 0 pool 6 'default.rgw.log' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 23 flags hashpspool stripe_width 0 pool 7 'default.rgw.users.uid' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 26 flags hashpspool stripe_width 0 pool 8 'kube' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 36 flags hashpspool stripe_width 0 pool 9 'stage' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 54 flags hashpspool stripe_width 0 |
可以看到,每个pool都是 3 副本(replicated size 3
)
total pg num
公式如下:
1 | Total PGs = (Total_number_of_OSD * 100) / max_replication_count |
结果必须取最接近该数的 2 的幂
比如,根据以上信息:
1 2 3 | Total_number_of_OSD = 6 max_replication_count = 3 Total PGs = 200 |
最接近 200 的 2 的幂是 256。所以推荐的集群最大 total PGs 数为 256。
pool pg num
每个 pool 的 pg 数目计算:
1 | Total PGs = ((Total_number_of_OSD * 100) / max_replication_count) / pool_count |
结果同样要取最接近的 2 的幂。
对应该例,每个 pool 的 pg num 为:
1 2 | pool_count = 10 Total PGs = 200 / 10 = 20 |
所以每个 pool 的平均分配 pg num 为 16。
pg num command
得到和设置指定 pool 中的 pg_num 和 pgp_num
ceph osd pool create <pool-name> <pg-number> <pgp-number> | To create a new pool |
ceph osd pool get <pool-name> <pg_num> | To get number of PG in a pool |
ceph osd pool get <pool-name> <pgp_num> | To get number of PGP in a pool |
ceph osd pool set <pool-name> <pg_num number> | To increase number of PG in a pool |
ceph osd pool set <pool-name> <pgp_num number> | To increase number of PGP in a pool |
创建pool时如果不指定 pg_num,默认为8
1 2 | # ceph --show-config | grep osd_pool_default_pg_num osd_pool_default_pg_num = 8 |