本篇主要用于记录在实施 docker 和 kubenetes 过程中遇到的一个问题和解决办法.
本节部分内容摘自互联网, 有些部分为自己在测试环境中遇到到实际问题, 后面还会根据实际情况不断分享关于 docker/k8s 在开发和维护过程中出现的种种问题, 以便后来者少走弯路.
系列目录
kubernets nodeport 无法访问
环境:
- Os: centos7.1
- Kubelet: 1.6.7
- Docker: 17.06-ce
- Calico: 2.3
- K8s Cluster: master, node-1, node-2
问题:
现有 service A, 为了能使外部访问, 故将 service type 设为 NodePort. 端口为 31246.
A 所对应的 pod 运行在 node-1 上.
经过测试发现, 外部访问 master:31246 和 node-2:31246 时均出现失败, 只有通过 node-1:31246 才可正常访问.
起因:
为了安全起见, docker 在 1.13 版本之后, 将系统 iptables 中 FORWARD 链的默认策略设置为 DROP, 并为连接到 docker0 网桥的容器添加了放行规则. 这里引用 moby issue#14041 中的描述:
- When docker starts, it enables.NET.ipv4.ip_forward without changing the iptables FORWARD chain default policy to DROP. This means that another machine on the same network as the docker host can add a route to their routing table, and directly address any containers running on that docker host.
- For example, if the docker0 subnet is 172.17.0.0/16 (the default subnet), and the docker host's IP address is 192.168.0.10, from another host on the network run:
- $ ip route add 172.17.0.0/16 via 192.168.0.10
- $ nmap 172.17.0.0/16
- 1
- 2
- The above will scan for containers running on the host, and report IP addresses & running services found.
- To fix this, docker needs to set the FORWARD policy to DROP when it enables the.NET.ipv4.ip_forward sysctl parameter.
kubernetes 使用的 cni 插件会因此受影响 (cni 并不会在 FORWARD 链中生成相应规则), 由此导致除 pod 所在 host 以外节点无法转发报文而访问失败.
解决办法:
如果对安全要求较低, 可将 FORWARD 链的默认规则设为 ACCEPT
iptables -P FORWARD ACCEPT
google 网络不可达
https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64/repodata/repomd.xml: [Errno 14] curl#7 - "Failed to connect to 2404:6800:4005:809::200e: 网络不可达"
经典问题, 需要自备梯子哦!
CentOS 中设置 yum 的 proxy
- vi /etc/yum.conf
- # 增加内容如下:
- proxy=http://xxx.xx.x.xx:xxx #代理地址
没有梯子的换阿里源
- cat> /etc/yum.repos.d/kubernetes.repo <<EOF
- [kubernetes]
- name=Kubernetes
- baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
- enabled=1
- gpgcheck=0
- repo_gpgcheck=0
- EOF
关闭 Swap
F1213 10:20:53.304755 2266 server.go:261] failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps contained:
执行 swapoff -a
设定 master 错误
- [ERROR FileContent--proc-sys.NET-bridge-bridge-nf-call-iptables]: /proc/sys.NET/bridge/bridge-nf-call-iptables contents are not set to 1
- [ERROR FileContent--proc-sys.NET-ipv4-ip_forward]: /proc/sys.NET/ipv4/ip_forward contents are not set to 1
- [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
按提示设定为 1
- echo "1">/proc/sys.NET/ipv4/ip_forward
- echo "1">/proc/sys.NET/bridge/bridge-nf-call-iptables
Kubeadm init 安装镜像卡住
- # kubeadm config images list --kubernetes-version v1.13.0 # 看下该版本下的镜像名
- # 拉取镜像
- docker pull mirrorgooglecontainers/kube-apiserver:v1.13.0
- docker pull mirrorgooglecontainers/kube-controller-manager:v1.13.0
- docker pull mirrorgooglecontainers/kube-scheduler:v1.13.0
- docker pull mirrorgooglecontainers/kube-proxy:v1.13.0
- docker pull mirrorgooglecontainers/pause:3.1
- docker pull mirrorgooglecontainers/etcd:3.2.24
- docker pull coredns/coredns:1.2.6
- # 重命名镜像标签
- docker tag docker.io/mirrorgooglecontainers/kube-proxy:v1.13.0 k8s.gcr.io/kube-proxy:v1.13.0
- docker tag docker.io/mirrorgooglecontainers/kube-scheduler:v1.13.0 k8s.gcr.io/kube-scheduler:v1.13.0
- docker tag docker.io/mirrorgooglecontainers/kube-apiserver:v1.13.0 k8s.gcr.io/kube-apiserver:v1.13.0
- docker tag docker.io/mirrorgooglecontainers/kube-controller-manager:v1.13.0 k8s.gcr.io/kube-controller-manager:v1.13.0
- docker tag docker.io/mirrorgooglecontainers/etcd:3.2.24 k8s.gcr.io/etcd:3.2.24
- docker tag docker.io/mirrorgooglecontainers/pause:3.1 k8s.gcr.io/pause:3.1
- docker tag docker.io/coredns/coredns:1.2.6 k8s.gcr.io/coredns:1.2.6
- # 删除旧镜像
- docker rmi docker.io/mirrorgooglecontainers/kube-proxy:v1.13.0
- docker rmi docker.io/mirrorgooglecontainers/kube-scheduler:v1.13.0
- docker rmi docker.io/mirrorgooglecontainers/kube-apiserver:v1.13.0
- docker rmi docker.io/mirrorgooglecontainers/kube-controller-manager:v1.13.0
- docker rmi docker.io/mirrorgooglecontainers/etcd:3.2.24
- docker rmi docker.io/mirrorgooglecontainers/pause:3.1
- docker rmi docker.io/coredns/coredns:1.2.6
终于看到了
- Your Kubernetes master has initialized successfully!
- To start using your cluster, you need to run the following as a regular user:
- mkdir -p $HOME/.kube
- sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
- sudo chown $(id -u):$(id -g) $HOME/.kube/config
- You should now deploy a pod network to the cluster.
- Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
- https://kubernetes.io/docs/concepts/cluster-administration/addons/
- You can now join any number of machines by running the following on each node
- as root:
- kubeadm join 192.168.232.204:6443 --token m2hxkd.scxjrxgew6pyhvmb --discovery-token-ca-cert-hash sha256:8b94cefbe54ae4b3d7201012db30966c53870aad55be80a2888ec0da178c3610
network 配置
- # 这是我的虚机 / etc/hosts 的配置
- 192.168.232.204 k8a204
- 192.168.232.203 k8a203
- 192.168.232.202 k8a202
按手册安装选用的网络, 并等待 dns 安装 OK, 然后增加 node.
- NAME STATUS ROLES AGE VERSION
- k8a204 Ready master 6m6s v1.13.0
- [root@localhost .kube]# kubectl get nodes
- NAME STATUS ROLES AGE VERSION
- k8a203 NotReady <none> 4s v1.13.0
- k8a204 Ready master 6m19s v1.13.0
注意, 配置较慢, 耐心等待
kubectl get pods --all-namespaces
=============== 以下是结果 ===============
- NAMESPACE NAME READY STATUS RESTARTS AGE
- kube-system coredns-86c58d9df4-2vdvx 1/1 Running 0 7m32s
- kube-system coredns-86c58d9df4-88fjk 1/1 Running 0 7m32s
- kube-system etcd-k8a204 1/1 Running 0 6m39s
- kube-system kube-apiserver-k8a204 1/1 Running 0 6m30s
- kube-system kube-controller-manager-k8a204 1/1 Running 0 6m30s
- kube-system kube-proxy-tl7g5 1/1 Running 0 7m32s
- kube-system kube-proxy-w2jgl 0/1 ContainerCreating 0 95s
- kube-system kube-scheduler-k8a204 1/1 Running 0 6m49s
节点加入后 NotReady
接上一问题:
ContainerCreating 状态时, 请耐心等待, 但是如果超过 10 分钟仍然无响应, 则必定是出错了, 囧!
最主要的问题: 节点的镜像拉不下来.
采用下列方式:
1) 在 master 主机内保存镜像为文件:
- docker save -o /opt/kube-pause.tar k8s.gcr.io/pause:3.1
- docker save -o /opt/kube-proxy.tar k8s.gcr.io/kube-proxy:v1.13.0
- docker save -o /opt/kube-flannel1.tar quay.io/coreos/flannel:v0.9.1
- docker save -o /opt/kube-flannel2.tar quay.io/coreos/flannel:v0.10.0-amd64
- docker save -o /opt/kube-calico1.tar quay.io/calico/cni:v3.3.2
- docker save -o /opt/kube-calico2.tar quay.io/calico/node:v3.3.2
2) 拷贝文件到 node 计算机
scp /opt/*.tar root@192.168.232.203:/opt/
3) 在 node 节点执行 docker 导入
- docker load -i /opt/kube-flannel1.tar
- docker load -i /opt/kube-flannel2.tar
- docker load -i /opt/kube-proxy.tar
- docker load -i /opt/kube-pause.tar
- docker load -i /opt/kube-calico1.tar
- docker load -i /opt/kube-calico2.tar
4) 检查 node 节点镜像文件
docker images
============================================== 以下是结果 ======================================
- REPOSITORY TAG IMAGE ID CREATED SIZE
- k8s.gcr.io/kube-proxy v1.13.0 8fa56d18961f 9 days ago 80.2 MB
- quay.io/calico/node v3.3.2 4e9be81e3a59 9 days ago 75.3 MB
- quay.io/calico/cni v3.3.2 490d921fa49c 9 days ago 75.4 MB
- quay.io/coreos/flannel v0.10.0-amd64 f0fad859c909 10 months ago 44.6 MB
- k8s.gcr.io/pause 3.1 da86e6ba6ca1 11 months ago 742 kB
- quay.io/coreos/flannel v0.9.1 2b736d06ca4c 13 months ago 51.3 MB
搞定了, 所有服务均 running
[root@localhost .kube]# kubectl get pods --all-namespaces
==================================== 以下是结果 ========================================
- NAMESPACE NAME READY STATUS RESTARTS AGE
- kube-system calico-node-4dsg5 1/2 Running 0 42m
- kube-system calico-node-5dtk2 1/2 Running 0 41m
- kube-system calico-node-78qvp 1/2 Running 0 41m
- kube-system coredns-86c58d9df4-26vr7 1/1 Running 0 43m
- kube-system coredns-86c58d9df4-s5ljf 1/1 Running 0 43m
- kube-system etcd-k8a204 1/1 Running 0 42m
- kube-system kube-apiserver-k8a204 1/1 Running 0 42m
- kube-system kube-controller-manager-k8a204 1/1 Running 0 42m
- kube-system kube-proxy-8c7hs 1/1 Running 0 41m
- kube-system kube-proxy-dls8l 1/1 Running 0 41m
- kube-system kube-proxy-t65tc 1/1 Running 0 43m
- kube-system kube-scheduler-k8a204 1/1 Running 0 42m
重启恢复 master
- swapoff -a
- # 启动所有容器
- # 更简洁的命令: docker start $(docker ps -aq)
- docker start $(docker ps -a | awk '{ print $1}' | tail -n +2)
- systemctl start kubelet
- # 查看启动错误
- journalctl -xefu kubelet
- # docker 开机自启
- docker run --restart=always
DNS 解析 kubernetes.default 失败
安装 busybox 进行 dns 检测, 一直出现如下错误:
kubectl exec -ti busybox -- nslookup kubernetes.default
============================= 以下是结果 ============================
- Server: 10.96.0.10
- Address: 10.96.0.10:53
- ** server can't find kubernetes.default: NXDOMAIN
- *** Can't find kubernetes.default: No answer
经查, 新版 busybox 的 dns 解析有变化或 bug, 采用旧版本 busybox images <= 1.28.4 后测试 OK
token 过期后重新生成
- # 生成新的 token
- kubeadm token create
- # 生成新的 token hash 码
- openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.*//'
- # 利用新的 token 和 hash 码加入节点
- # master 地址, token,hash 请自行更换
- kubeadm join 192.168.232.204:6443 --token m87q91.gbcqhfx9ansvaf3o --discovery-token-ca-cert-hash sha256:fdd34ef6c801e382f3fb5b87bc9912a120bf82029893db121b9c8eae29e91c62
来源: https://www.cnblogs.com/tylerzhou/p/10975062.html