记录一下今天正常关闭一个物理宿主机的过程环境是 3node HA, 控制存储计算融合节点, kolla 部署环境, 启用 ceph 存储, 关闭其中一台融合节点 controller03 大概过程是先热迁移这个物理机上的虚拟机, 然后设置 ceph 集群 osd noout , 使关闭这个节点后 ceph 的 osd 数据不会重平衡, 避免大的数据震荡, 接着在 web 界面上关闭节点, 最后 ssh 登陆节点关机:
1. 热迁移这个节点的虚拟机登陆 web 管理界面, 管理员 -> 实例 -> 选择这个节点中的虚拟机 -> 热迁移 -> 选择其他节点 , 等待迁移成功, 并验证;
2. 设置所有 ceph 节点 osd noout , 登陆所有 ceph 节点, 并运行: docker exec -it ceph_mon ceph osd set noout ;
3.web 界面上关闭节点登陆 web 管理界面, 管理员 -> 虚拟机管理器 -> 计算主机 -> 选择对应的宿主机 -> 关闭服务 ;
4.ssh 登陆节点关机执行命令: shutdown -h now ;
关闭的时候执行命令 ceph -w 实时查看 ceph 集群 osd 数据是否有重平衡动作:
- [root@control02 mariadb]# docker exec -it ceph_mon ceph -w
- cluster 33932e16-1909-4d68-b085-3c01d0432adc
- health HEALTH_WARN
- noout flag(s) set
- monmap e2: 3 mons at {192.168.1.130=192.168.1.130:6789/0,192.168.1.131=192.168.1.131:6789/0,192.168.1.132=192.168.1.132:6789/0}
- election epoch 72, quorum 0,1,2 192.168.1.130,192.168.1.131,192.168.1.132
- osdmap e466: 9 osds: 9 up, 9 in
- flags noout,sortbitwise,require_jewel_osds
- pgmap v712835: 640 pgs, 13 pools, 14902 MB data, 7300 objects
- 30288 MB used, 824 GB / 854 GB avail
- 640 active+clean
用 ceph -s 查看状态:
- [root@control01 kolla]# docker exec -it ceph_mon ceph osd set noout
- set noout
- [root@control01 kolla]# docker exec -it ceph_mon ceph -s
- cluster 33932e16-1909-4d68-b085-3c01d0432adc
- health HEALTH_WARN
- 412 pgs degraded
- 404 pgs stuck unclean
- 412 pgs undersized
- recovery 4759/14600 objects degraded (32.596%)
- 3/9 in osds are down
- noout flag(s) set
- 1 mons down, quorum 0,1 192.168.1.130,192.168.1.131
- monmap e2: 3 mons at {192.168.1.130=192.168.1.130:6789/0,192.168.1.131=192.168.1.131:6789/0,192.168.1.132=192.168.1.132:6789/0}
- election epoch 74, quorum 0,1 192.168.1.130,192.168.1.131
- osdmap e468: 9 osds: 6 up, 9 in; 412 remapped pgs
- flags noout,sortbitwise,require_jewel_osds
- pgmap v712931: 640 pgs, 13 pools, 14902 MB data, 7300 objects
- 30285 MB used, 824 GB / 854 GB avail
- 4759/14600 objects degraded (32.596%)
- 412 active+undersized+degraded
- 228 active+clean
- [root@control01 kolla]#
- [root@control01 kolla]#
- [root@control01 kolla]# docker exec -it ceph_mon ceph -s
- cluster 33932e16-1909-4d68-b085-3c01d0432adc
- health HEALTH_WARN
- 412 pgs degraded
- 405 pgs stuck unclean
- 412 pgs undersized
- recovery 4759/14600 objects degraded (32.596%)
- 3/9 in osds are down
- noout flag(s) set
- 1 mons down, quorum 0,1 192.168.1.130,192.168.1.131
- monmap e2: 3 mons at {192.168.1.130=192.168.1.130:6789/0,192.168.1.131=192.168.1.131:6789/0,192.168.1.132=192.168.1.132:6789/0}
- election epoch 74, quorum 0,1 192.168.1.130,192.168.1.131
- osdmap e468: 9 osds: 6 up, 9 in; 412 remapped pgs
- flags noout,sortbitwise,require_jewel_osds
- pgmap v712981: 640 pgs, 13 pools, 14902 MB data, 7300 objects
- 30285 MB used, 824 GB / 854 GB avail
- 4759/14600 objects degraded (32.596%)
- 412 active+undersized+degraded
- 228 active+clean
- client io 7559 B/s rd, 20662 B/s wr, 11 op/s rd, 1 op/s wr
发现 3 个 osd down, 但是还是 in 状态, 同时 pgmap 始终都是 412 active+undersized+degraded ,228 active+clean , 说明数据没有重平衡
另外, 检查所有的虚拟机, 正常运行
来源: http://www.bubuko.com/infodetail-2544111.html