问题描述
使用 CentOS7 的 yum 包管理器安装了 Kubernetes 集群, 使用 kubectl 创建服务成功后, 执行 kubectl get pods, 发现 AGE 虽然在不断增加, 但状态始终不变
本文内容
分析问题原因
给出直接解决此问题的方式 (不完美)
给出其他方案
且听我娓娓道来~
问题分析与解决
kubectl 提供了 describe 子命令来输出指定的一个 / 多个资源的详细信息.
执行 kubectl describe pod mytomcat-9lcq5, 查看问题 Pod 的状态信息, 输出如下:
- [root@kube-master App]# kubectl describe pod mytomcat-9lcq5
- Name: mytomcat-9lcq5
- Namespace: default
- Node: kube-node-2/192.168.87.145
- Start Time: Fri, 17 Apr 2020 15:53:50 +0800
- Labels: App=mytomcat
- Status: Pending
- IP:
- Controllers: ReplicationController/mytomcat
- Containers:
- mytomcat:
- Container ID:
- Image: tomcat:9-jre8-alpine
- Image ID:
- Port: 8080/TCP
- State: Waiting
- Reason: ContainerCreating
- Ready: False
- Restart Count: 0
- Volume Mounts: <none>
- Environment Variables: <none>
- Conditions:
- Type Status
- Initialized True
- Ready False
- PodScheduled True
- No volumes.
- QoS Class: BestEffort
- Tolerations: <none>
- Events:
- FirstSeen LastSeen Count From SubObjectPath Type Reason Message
- --------- -------- ----- ---- ------------- -------- ------ -------
- 5m 5m 1 {default-scheduler } Normal Scheduled Successfully assigned mytomcat-9lcq5 to kube-node-2
- 4m 4m 1 {kubelet kube-node-2} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "image pull failed for registry.access.redhat.com/rhel7/pod-infrastructure:latest, this may be because there are no credentials on this request. details: (Get https://registry.access.redhat.com/v1/_ping: net/http: TLS handshake timeout)"
- 3m 3m 1 {kubelet kube-node-2} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "image pull failed for registry.access.redhat.com/rhel7/pod-infrastructure:latest, this may be because there are no credentials on this request. details: (Network timed out while trying to connect to https://registry.access.redhat.com/v1/repositories/rhel7/pod-infrastructure/images. You may want to check your internet connection or if you are behind a proxy.)"
- 2m 2m 1 {kubelet kube-node-2} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "image pull failed for registry.access.redhat.com/rhel7/pod-infrastructure:latest, this may be because there are no credentials on this request. details: (Error: image rhel7/pod-infrastructure:latest not found)"
- 3m 1m 3 {kubelet kube-node-2} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ImagePullBackOff: "Back-off pulling image \"registry.access.redhat.com/rhel7/pod-infrastructure:latest\""
通过查看最下方的输出信息, Successfully assigned mytomcat-9lcq5 to kube-node-2 说明这个 Pod 分配到 kube-node-2 这个主机上了, 然后在这个主机上创建 Pod 失败,
原因是 image pull failed for registry.access.RedHat.com/rhel7/pod-infrastructure:latest, this may be because there are no credentials on this request.
通过以上信息, 我们了解到通过红帽自家的 docker 仓库 pull 镜像, 需要使用 CA 证书进行认证, 才能 pull 成功
docker 的证书在 /etc/docker/certs.d 目录下, 根据上边的错误提示域名是 registry.access.RedHat.com, 证书在这个目录中
经过 ll 命令查看, 发现 /etc/docker/certs.d/registry.access.RedHat.com/RedHat-ca.crt 是一个软链接 (软链接是什么? https://www.runoob.com/linux/linux-comm-ln.html ), 指向到 /etc/rhsm/ca/RedHat-uep.pem,
熟悉软连接的我们知道, 处于红色闪烁状态的目标是不存在, 需要生成 /etc/rhsm/ca/RedHat-uep.pem 证书文件
生成证书:
# openssl s_client -showcerts -servername registry.access.RedHat.com -connect registry.access.RedHat.com:443 </dev/null 2>/dev/null | openssl x509 -text> /etc/rhsm/ca/RedHat-uep.pem
生成证书命令执行有时会出现 unable to load certificate 139930742028176:error:0906D06C:PEM routines:PEM_read_bio:no start line:pem_lib.c:707:Expecting: TRUSTED CERTIFICATE 问题, 重新执行就好
命令执行完毕后, 查看软链接指向的证书文件:
- [root@kube-node-2 registry.access.RedHat.com]# ll /etc/rhsm/ca/RedHat-uep.pem
- -rw-r--r-- 1 root root 9233 Apr 17 16:55 /etc/rhsm/ca/RedHat-uep.pem
证书文件已经存在, 我们去 k8s 管理节点 kube-master 主机删除刚才的 Pods, 等待 Pod 重新创建成功 (第二个节点因为网络问题没有拉成功镜像......)
至此完成 Pod 的创建
但是还有存在些问题的, 当前国内网络环境访问外边的网络偶尔会有问题, 导致创建 Pod 失败, 通过 describe 描述还是同样的信息提示, 但是查看证书文件却存在且有内容
原因分析与其他方案
k8s 管理节点分配创建 Pod 到执行节点, 到达执行节点后, 拉取红帽 docker 仓库的 Pod 基础镜像 pod-infrastructure:latest, 由于其仓库使用 https 需要验证证书, 证书不存在导致失败
另外就是因为拉取的镜像是红帽 docker 仓库中的, 在国内网络环境下握手失败, 无法下载镜像
所以问题就成了 如何解决 k8s pod-infrastructure 镜像拉取失败, 这里给出一个方案, 步骤如下:
拉取 docker 官方仓库其他人上传的 pod-infrastructure 镜像, docker pull tianyebj/pod-infrastructure
添加 tag 标签, 改为私有仓库地址, 如: docker tag tianyebj/pod-infrastructure 10.2.7.70:5000/dev/pod-infrastructure
push 镜像到私有仓库, 如: docker push 10.2.7.70:5000/dev/pod-infrastructure
修改所有 worker 节点的 /etc/kubernetes/kubelet, 修改 registry.access.RedHat.com/rhel7/pod-infrastructure 为刚才设置的 tag 标签
sed -i "s#registry.access.redhat.com/rhel7/pod-infrastructure#< 私有仓库 pod-infrastructure 镜像 tag>#" /etc/kubernetes/kubelet
重启所有 worker 节点的 kubelet,systemctl restart kubelet, 即可
注意事项:
上传的镜像要设为公开镜像, 否则 kubelet 自己没权限拉镜像的, 另外也可以去 SSH 登录 worker 节点登录仓库, 执行
docker pull < 私有仓库 pod-infrastructure 镜像 tag>
最后的效果:
参考
来源: https://www.cnblogs.com/hellxz/p/k8s-pod-always-container-creating-status-problem.html