尝试一、直接重新激活所有osd
1、查看osd树
root@ceph01:~# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.29279 root default
-2 0.14639 host ceph01
0 0.14639 osd.0 up 1.00000 1.00000
-3 0.14639 host ceph02
1 0.14639 osd.1 down 0 1.00000
发现osd.1是down掉的。
2、再次激活所有的osd(记住是所有的,不只是down掉这一个)
下面命令当中的/dev/sdb1是每一个osd节点使用的实际存储硬盘或分区。
ceph-deploy osd activate ceph01:/dev/sdb1 ceph02:/dev/sdb1
3、查看osd树和健康状态
root@ceph01:~/my-cluster# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
- 0.29279 root default
- 0.14639 host ceph01
0.14639 osd. up 1.00000 1.00000
- 0.14639 host ceph02
0.14639 osd. up 1.00000 1.00000
root@ceph01:~/my-cluster#
root@ceph01:~/my-cluster# ceph -s
cluster ecacda71-af9f-46f9-a2a3-a35c9e51db9e
health HEALTH_OK
monmap e1: mons at {ceph01=10.111.131.125:/}
election epoch , quorum ceph01
osdmap e150: osds: up, in
flags sortbitwise,require_jewel_osds
pgmap v9284: pgs, pools, bytes data, objects
MB used, GB / GB avail
active+clean
只有为 HEALTH_OK 才算是正常的。
尝试二、修复down掉的osd
该方法主要应用于某个osd物理损坏,导致激活不了
1、查看osd树
root@ceph01:~# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
- 0.29279 root default
- 0.14639 host ceph01
0.14639 osd. up 1.00000 1.00000
- 0.14639 host ceph02
0.14639 osd. down 1.00000
发现osd.1是down掉的。
2、将osd.1的状态设置为out
root@ceph02:~# ceph osd out osd.
osd. is already out.
3、从集群中删除
root@ceph02:~# ceph osd rm osd.
removed osd.
4、从CRUSH中删除
root@ceph02:~# ceph osd crush rm osd.
removed item id name 'osd.1' from crush map
5、删除osd.1的认证信息
root@ceph02:~# ceph auth del osd.
updated
6、umount
umount /dev/sdb1
7、再次查看osd的集群状态
root@ceph02:~# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
- 0.14639 root default
- 0.14639 host ceph01
0.14639 osd. up 1.00000 1.00000
- host ceph02
8、登录ceph-deploy节点
root@ceph01:~# cd /root/my-cluster/
root@ceph01:~/my-cluster#
9、初始化磁盘
ceph-deploy --overwrite-conf osd prepare ceph02:/dev/sdb1
10、再次激活所有的osd(记住是所有的,不只是down掉这一个)
ceph-deploy osd activate ceph01:/dev/sdb1 ceph02:/dev/sdb1
11、查看osd树和健康状态
root@ceph01:~/my-cluster# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.29279 root default
-2 0.14639 host ceph01
0 0.14639 osd.0 up 1.00000 1.00000
-3 0.14639 host ceph02
1 0.14639 osd.1 up 1.00000 1.00000
root@ceph01:~/my-cluster#
root@ceph01:~/my-cluster# ceph -s
cluster ecacda71-af9f-46f9-a2a3-a35c9e51db9e
health HEALTH_OK
monmap e1: 1 mons at {ceph01=10.111.131.125:6789/0}
election epoch 14, quorum 0 ceph01
osdmap e150: 2 osds: 2 up, 2 in
flags sortbitwise,require_jewel_osds
pgmap v9284: 64 pgs, 1 pools, 17 bytes data, 3 objects
10310 MB used, 289 GB / 299 GB avail
64 active+clean
只有为 HEALTH_OK 才算是正常的。