实验目的
一般情况下, k8s 集群中只有一台 master 和多台 node, 当 master 故障时, 引发的事故后果可想而知.
故本文目的在于体现集群的高可用, 即当集群中的一台 master 宕机后, k8s 集群通过 vip 的转移, 又会有新的节点被选举为集群的 master, 并保持集群的正常运作.
因本文体现的是 master 节点的高可用, 为了实现效果, 同时因资源条件限制, 故总共采用 4 台服务器完成本次实验, 3 台 master,1 台 node.
看到这也需有人有疑惑, 总共有 4 台机器的资源, 为啥不能 2 台 master 呢? 这是因为通过 kubeadm 部署的集群, 当中的 etcd 集群默认部署在 master 节点上, 3 节点方式最多能容忍 1 台服务器宕机. 如果是 2 台 master, 当中 1 台宕机了, 直接导致 etcd 集群故障, 以至于 k8s 集群异常, 这些基础环境都 over 了, vip 漂移等高可用也就在白瞎.
环境说明
基本信息
- # 主机列表
- 10.2.2.137 master1
- 10.2.2.166 master2
- 10.2.2.96 master3
- 10.2.3.27 node0
- # 软件版本
- docker version:18.09.9
- k8s version:v1.16.4
架构信息
本文采用 kubeadm 方式搭建集群, 通过 keepalived 的 vip 策略实现高可用, 架构图如下:
# 主备模式高可用架构说明
a)apiserver 通过 keepalived 实现高可用, 当某个节点故障时触发 vip 转移;
b)controller-manager 和 scheduler 在 k8s 内容通过选举方式产生领导者 (由 leader-elect 选型控制, 默认为 true), 同一时刻集群内只有一个 scheduler 组件运行;
c)etcd 在 kubeadm 方式实现集群时, 其在 master 节点会自动创建 etcd 集群, 来实现高可用, 部署的节点为奇数, 3 节点方式最多容忍一台机器宕机.
环境准备
说明
1, 大多数文章都是一步步写命令写步骤, 而对于有部署经验的人来说觉得繁琐化了, 故本文大部分服务器 shell 命令操作都将集成到脚本;
2, 本文相关 shell 脚本地址: https://gitee.com/kazihuo/k8s-install-shell
3, 所有要加入到 k8s 集群的机器都执行本部分操作.
操作
a) 将所有服务器修改成对应的主机名, master1 示例如下;
# hostnamectl set-hostname master1 #重新登录后显示新设置的主机名
b) 配置 master1 到 master2,master3 免密登录, 本步骤只在 master1 上执行;
- [root@master1 ~]# SSH-keygen -t rsa # 一路回车
- [root@master1 ~]# SSH-copy-id -i /root/.SSH/id_rsa.pub root@10.2.2.166
- [root@master1 ~]# SSH-copy-id -i /root/.SSH/id_rsa.pub root@10.2.2.96
c) 脚本实现环境需求配置;
# sh set-prenv.sh
- #!/bin/bash
- #====================================================
- # Author: LuoMuRui
- # Blog: https://www.cnblogs.com/kazihuo
- # Create Date: 2020-04-02
- # Description: System environment configuration.
- #====================================================
- # Stop firewalld
- firewall-cmd --state #查看防火墙状态
- systemctl stop firewalld.service #停止 firewall
- systemctl disable firewalld.service #禁止 firewall 开机启动
- # Close selinux
- getenforce #查看 selinux 状态
- setenforce 0 #临时关闭 selinux
- sed -i 's/^ *SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config #永久关闭 (需重启系统)
- # Set localhost's dns
- cat>> /etc/hosts <<EOF
- 10.2.2.137 master1
- 10.2.2.166 master2
- 10.2.2.96 master3
- 10.2.3.27 node0
- EOF
- # Disable swap
- swapoff -a #临时禁用
- sed -i.bak '/swap/s/^/#/' /etc/fstab #永久禁用
- # Kernel parameter modification
- # k8s 网络使用 flannel, 该网络需要设置内核参数 bridge-nf-call-iptables=1, 修改这个参数需要系统有 br_netfilter 模块
- lsmod |grep br_netfilter # 查看模块
- modprobe br_netfilter # 临时新增
- # 永久新增模块
- cat> /etc/rc.sysinit <<EOF
- #!/bin/bash
- for file in /etc/sysconfig/modules/*.modules ; do
- [ -x $file ] && $file
- done
- EOF
- cat> /etc/sysconfig/modules/br_netfilter.modules <<EOF
- modprobe br_netfilter
- EOF
- chmod 755 /etc/sysconfig/modules/br_netfilter.modules
- cat <<EOF> /etc/sysctl.d/k8s.conf
- net.bridge.bridge-nf-call-ip6tables = 1
- net.bridge.bridge-nf-call-iptables = 1
- EOF
- sysctl -p /etc/sysctl.d/k8s.conf
- # Set the yum of k8s
- cat <<EOF> /etc/yum.repos.d/kubernetes.repo
- [kubernetes]
- name=Kubernetes
- baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
- enabled=1
- gpgcheck=1
- repo_gpgcheck=1
- gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
- EOF
- yum clean all && yum -y makecache
- set-prenv.sh
软件安装
docker 安装
说明: 所有节点都执行本部分操作!
# sh install-docker.sh
- #!/bin/bash
- #====================================================
- # Author: LuoMuRui
- # Blog: https://www.cnblogs.com/kazihuo
- # Create Date: 2020-04-02
- # Description: Install docker.
- #====================================================
- # Install dependency package
- yum install -y yum-utils device-mapper-persistent-data lvm2
- # Add repo
- yum-config-manager \
- --add-repo \
- https://download.docker.com/linux/centos/docker-ce.repo
- # Looking the version of docker
- yum list docker-ce --showduplicates | sort -r
- # Install docker
- yum install docker-ce-18.09.9 docker-ce-cli-18.09.9 containerd.io -y
- # Mirror acceleration & change Cgroup Driver
- # 修改 cgroupdriver 是为了消除告警:[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
- cat> daemon.JSON <<EOF
- {
- "registry-mirrors": ["https://registry.docker-cn.com"],
- "exec-opts": ["native.cgroupdriver=systemd"]
- }
- EOF
- # Start && enable
- systemctl daemon-reload
- systemctl start docker
- systemctl enable docker
- docker --version
- # Others
- yum -y install bash-completion
- source /etc/profile.d/bash_completion.sh
- install-docker.sh
keepalived 安装
说明: 三台 master 节点执行本部分操作!
- # 安装
- # yum -y install keepalived
- # 配置
- # master1 上 keepalived 配置
- [root@master1 ~]# cat /etc/keepalived/keepalived.conf
- ! Configuration File for keepalived
- global_defs {
- router_id master1
- }
- vrrp_instance VI_1 {
- state MASTER
- interface eth0
- virtual_router_id 51
- priority 150
- advert_int 1
- authentication {
- auth_type PASS
- auth_pass 1111
- }
- virtual_ipaddress {
- 10.2.2.6
- }
- }
- View Code
- # master2 上 keepalived 配置
- [root@master2 ~]# cat /etc/keepalived/keepalived.conf
- ! Configuration File for keepalived
- global_defs {
- router_id master2
- }
- vrrp_instance VI_1 {
- state BACKUP
- interface eth0
- virtual_router_id 51
- priority 100
- advert_int 1
- authentication {
- auth_type PASS
- auth_pass 1111
- }
- virtual_ipaddress {
- 10.2.2.6
- }
- }
- View Code
- # master3 上 keepalived 配置
- [root@master3 ~]# cat /etc/keepalived/keepalived.conf
- ! Configuration File for keepalived
- global_defs {
- router_id master3
- }
- vrrp_instance VI_1 {
- state BACKUP
- interface eth0
- virtual_router_id 51
- priority 50
- advert_int 1
- authentication {
- auth_type PASS
- auth_pass 1111
- }
- virtual_ipaddress {
- 10.2.2.6
- }
- }
- View Code
- # 启动
- # service keepalived start
- # systemctl enable keepalived
- # vip 查看
- [root@master1 ~]# ip a
# 功能功验
将 master1 上的 keepalived 服务器停止或者 master1 服务器关机后, master2 上接管 vip, 同时 master2 也关机后, master3 接管 vip.
k8s 安装
说明: 所有节点都执行本部分操作!
组件:
kubelet
运行在集群所有节点上, 用于启动 Pod 和容器等对象的工具
kubeadm
用于初始化集群, 启动集群的命令工具
kubectl
用于和集群通信的命令行, 通过 kubectl 可以部署和管理应用, 查看各种资源, 创建, 删除和更新各种组件
# sh install-k8s.sh
- #!/bin/bash
- #====================================================
- # Author: LuoMuRui
- # Blog: https://www.cnblogs.com/kazihuo
- # Create Date: 2020-04-02
- # Description: Install the soft of k8s.
- #====================================================
- # Looking the version
- yum list kubelet --showduplicates | sort -r
- # Install
- yum install -y kubelet-1.16.4 kubeadm-1.16.4 kubectl-1.16.4
- # Start && enable
- systemctl enable kubelet && systemctl start kubelet
- # Bash env
- echo "source <(kubectl completion bash)">> ~/.bash_profile
- source /root/.bash_profile
- install-k8s.sh
镜像下载
说明: 所有节点都执行本部分操作!
因国内网络的限制, 故从阿里云镜像仓库下载镜像后本地打回默认标签名的方式, 让 kubeadm 在部署集群时能正常使用镜像.
# sh download-images.sh
- #!/bin/bash
- #====================================================
- # Author: LuoMuRui
- # Blog: https://www.cnblogs.com/kazihuo
- # Create Date: 2020-04-02
- # Description: Download images.
- #====================================================
- url=registry.cn-hangzhou.aliyuncs.com/loong576
- version=v1.16.4
- images=(`kubeadm config images list --kubernetes-version=$version|awk -F '/' '{print $2}'`)
- for imagename in ${images[@]} ; do
- docker pull $url/$imagename
- docker tag $url/$imagename k8s.gcr.io/$imagename
- docker rmi -f $url/$imagename
- done
- download-images.sh
master 初始化
初始化操作
- [root@master1 ~]# cat kubeadm-config.YAML
- apiVersion: kubeadm.k8s.io/v1beta2
- kind: ClusterConfiguration
- kubernetesVersion: v1.16.4
- apiServer:
- certSANs:
- - 10.2.2.6
- controlPlaneEndpoint: "10.2.2.6:6443"
- networking:
- podSubnet: "10.244.0.0/16"
- [root@master01 ~]# kubeadm init --config=kubeadm-config.YAML
- # 初始化成功后末尾显示 kubeadm join 的信息, 记录下来;
- You can now join any number of control-plane nodes by copying certificate authorities
- and service account keys on each node and then running the following as root:
- kubeadm join 10.2.2.6:6443 --token 2ccecd.v72vziyzdfnbr46u \
- --discovery-token-ca-cert-hash sha256:eb92768acb748d722ef7d97bc60751a375b67b12a46c7a7232c54cdb378d2e61 \
- --control-plane
- Then you can join any number of worker nodes by running the following on each as root:
- kubeadm join 10.2.2.6:6443 --token 2ccecd.v72vziyzdfnbr46u \
- --discovery-token-ca-cert-hash sha256:eb92768acb748d722ef7d97bc60751a375b67b12a46c7a7232c54cdb378d2e61
- # 初始化失败后可重新初始化
- # kubeadm reset
- # rm -rf $HOME/.kube/config
添加环境变量
- echo "export KUBECONFIG=/etc/kubernetes/admin.conf">> ~/.bash_profile
- source ~/.bash_profile
安装 flannel 插件
- # wget https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml
- # kubectl apply -f kube-flannel.YAML
control plane 节点加入
证书分发
- # master1 将认证文件同步到其他 master 节点
- [root@master1 ~]# sh cert-others-master.sh
- USER=root
- CONTROL_PLANE_IPS="master2 master3"
- #CONTROL_PLANE_IPS="master2 master3"
- for host in ${CONTROL_PLANE_IPS}; do
- scp /etc/kubernetes/pki/ca.crt "${USER}"@$host:
- scp /etc/kubernetes/pki/ca.key "${USER}"@$host:
- scp /etc/kubernetes/pki/sa.key "${USER}"@$host:
- scp /etc/kubernetes/pki/sa.pub "${USER}"@$host:
- scp /etc/kubernetes/pki/front-proxy-ca.crt "${USER}"@$host:
- scp /etc/kubernetes/pki/front-proxy-ca.key "${USER}"@$host:
- scp /etc/kubernetes/pki/etcd/ca.crt "${USER}"@$host:etcd-ca.crt
- # Quote this line if you are using external etcd
- scp /etc/kubernetes/pki/etcd/ca.key "${USER}"@$host:etcd-ca.key
- done
- View Code
- # master2 和 master3 节点配置证书
- [root@master2 ~]# sh cert-set.sh
- [root@master3 ~]# sh cert-set.sh
- mv /${
- USER
- }/ca.crt /etc/kubernetes/pki/
- mv /${
- USER
- }/ca.key /etc/kubernetes/pki/
- mv /${
- USER
- }/sa.pub /etc/kubernetes/pki/
- mv /${
- USER
- }/sa.key /etc/kubernetes/pki/
- mv /${
- USER
- }/front-proxy-ca.crt /etc/kubernetes/pki/
- mv /${
- USER
- }/front-proxy-ca.key /etc/kubernetes/pki/
- mv /${
- USER
- }/etcd-ca.crt /etc/kubernetes/pki/etcd/ca.crt
- # Quote this line if you are using external etcd
- mv /${
- USER
- }/etcd-ca.key /etc/kubernetes/pki/etcd/ca.key
- View Code
others master 加入集群
- # master2 和 master3 加入集群, 下文以 master2 为示例, master3 按部就班即可;
- [root@master2 ~]# kubeadm join 10.2.2.6:6443 --token 2ccecd.v72vziyzdfnbr46u \
- --discovery-token-ca-cert-hash sha256:eb92768acb748d722ef7d97bc60751a375b67b12a46c7a7232c54cdb378d2e61 \
- --control-plane
- [root@master2 ~]# scp master1:/etc/kubernetes/admin.conf /etc/kubernetes/
- [root@master2 ~]# echo "export KUBECONFIG=/etc/kubernetes/admin.conf">> ~/.bash_profile && source .bash_profile
集群节点查看
- # kubectl get nodes
- NAME STATUS ROLES AGE VERSION
- master1 Ready master 20h v1.16.4
- master2 Ready master 20h v1.16.4
- master3 Ready master 19h v1.16.4
node 节点加入
加入操作
- [root@node0 ~]# kubeadm join 10.2.2.6:6443 --token 2ccecd.v72vziyzdfnbr46u \
- --discovery-token-ca-cert-hash sha256:eb92768acb748d722ef7d97bc60751a375b67b12a46c7a7232c54cdb378d2e61
节点查看
- # kubectl get nodes
- NAME STATUS ROLES AGE VERSION
- node0 Ready <none> 18h v1.16.4
- master1 Ready master 20h v1.16.4
- master2 Ready master 20h v1.16.4
- master3 Ready master 19h v1.16.4
集群功能验证
操作
- # 关机 master1, 模拟宕机
- [root@master1 ~]# init 0
- # vip 飘到 master2
- [root@master2 ~]# ip a |grep '2.6'
- .NET 10.2.2.6/32 scope global eth0
- # 组件 controller-manager 和 scheduler 发生迁移
- # kubectl get endpoints kube-controller-manager -n kube-system -o YAML |grep holderIdentity
- control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"master3_885468ec-f9ce-4cc6-93d6-235508b5a130","leaseDurationSeconds":15,"acquireTime":"2020-04-01T10:15:28Z","renewTime":"2020-04-02T06:02:46Z","leaderTransitions":8}'
- # kubectl get endpoints kube-scheduler -n kube-system -o YAML |grep holderIdentity
- control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"master2_cf16fd61-0202-4610-9a27-3cd9d26b4141","leaseDurationSeconds":15,"acquireTime":"2020-04-01T10:15:25Z","renewTime":"2020-04-02T06:03:09Z","leaderTransitions":9}'
- # 集群创建 pod, 依旧正常使用
- # cat nginx.YAML
- apiVersion: apps/v1
- kind: Deployment
- metadata:
- name: nginx-test
- spec:
- selector:
- matchLabels:
- App: nginx
- replicas: 3
- template:
- metadata:
- labels:
- App: nginx
- spec:
- containers:
- - name: nginx
- image: nginx:latest
- # kubectl apply -f nginx.YAML
- # kubectl get pods
结论
1, 集群中 3 个 master 节点, 无论哪个节点宕机, 都不影响集群的正常使用;
2, 当集群中 3 个 master 节点有 2 个故障, 则造成 etcd 集群故障, 直接影响集群, 导致异常!
来源: https://www.cnblogs.com/kazihuo/p/12619719.html