了解 k8s 中的 Liveness 和 Readiness
Liveness:
表明是否容器正在运行. 如果 liveness 探测为 fail, 则 kubelet 会 kill 掉容器, 并且会触发 restart 设置的策略. 默认不设置的情况下, 该状态为 success.
Readiness:
表明容器是否可以接受服务请求. 如果 readiness 探测失败, 则 endpoints 控制器会从 endpoints 中摘除该 Pod IP. 在初始化延迟探测时间之前, 默认是 Failure. 如果没有设置 readiness 探测, 该状态为 success.
代码分析
基于 Kubernetes 1.11.0
1. 启动探测
在 kubelet 启动是时候会启动健康检查的探测:
kubelet.go 中 Run 方法
- ...
- kl.probeManager.Start() // 启动探测服务
- ...
2. 看一下 probeManager 都做了哪些事情
prober_manager.go 中我们看一下这段代码:
- // Manager manages pod probing. It creates a probe "worker" for every container that specifies a
- // probe (AddPod). The worker periodically probes its assigned container and caches the results. The
- // manager use the cached probe results to set the appropriate Ready state in the PodStatus when
- // requested (UpdatePodStatus). Updating probe parameters is not currently supported.
- // TODO: Move liveness probing out of the runtime, to here.
- type Manager interface {
- // AddPod creates new probe workers for every container probe. This should be called for every
- // pod created.
- AddPod(pod *v1.Pod)
- // RemovePod handles cleaning up the removed pod state, including terminating probe workers and
- // deleting cached results.
- RemovePod(pod *v1.Pod)
- // CleanupPods handles cleaning up pods which should no longer be running.
- // It takes a list of "active pods" which should not be cleaned up.
- CleanupPods(activePods []*v1.Pod)
- // UpdatePodStatus modifies the given PodStatus with the appropriate Ready state for each
- // container based on container running status, cached probe results and worker states.
- UpdatePodStatus(types.UID, *v1.PodStatus)
- // Start starts the Manager sync loops.
- Start()
- }
这是一个 Manager 的接口声明, 该 Manager 负载 pod 的探测. 当执行 AddPod 时, 会为 Pod 中每一个容器创建一个执行探测任务的 worker, 该 worker 会对所分配的容器进行周期性的探测, 并把探测结果缓存. 当 UpdatePodStatus 方法执行时, 该 manager 会使用探测的缓存结果设置 PodStatus 为近似 Ready 的状态:
3. 一 "探" 究竟
先看一下探测的 struct
- type Probe struct {
- // The action taken to determine the health of a container
- Handler `json:",inline" protobuf:"bytes,1,opt,name=handler"`
- // Number of seconds after the container has started before liveness probes are initiated.
- // More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
- // +optional
- InitialDelaySeconds int32 `json:"initialDelaySeconds,omitempty" protobuf:"varint,2,opt,name=initialDelaySeconds"`
- // Number of seconds after which the probe times out.
- // Defaults to 1 second. Minimum value is 1.
- // More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
- // +optional
- TimeoutSeconds int32 `json:"timeoutSeconds,omitempty" protobuf:"varint,3,opt,name=timeoutSeconds"`
- // How often (in seconds) to perform the probe.
- // Default to 10 seconds. Minimum value is 1.
- // +optional
- PeriodSeconds int32 `json:"periodSeconds,omitempty" protobuf:"varint,4,opt,name=periodSeconds"`
- // Minimum consecutive successes for the probe to be considered successful after having failed.
- // Defaults to 1. Must be 1 for liveness. Minimum value is 1.
- // +optional
- SuccessThreshold int32 `json:"successThreshold,omitempty" protobuf:"varint,5,opt,name=successThreshold"`
- // Minimum consecutive failures for the probe to be considered failed after having succeeded.
- // Defaults to 3. Minimum value is 1.
- // +optional
- FailureThreshold int32 `json:"failureThreshold,omitempty" protobuf:"varint,6,opt,name=failureThreshold"`
- }
initialDelaySeconds: 表示容器启动之后延迟多久进行 liveness 探测
timeoutSeconds: 每次执行探测的超时时间
periodSeconds: 探测的周期时间
successThreshold: 最少连续几次探测成功的次数, 满足该次数则认为 success.
failureThreshold: 最少连续几次探测失败的次数, 满足该次数则认为 fail
Handler:
不论是 liveness 还是 readiness 都支持 3 种类型的探测方式: 执行命令, http 方式以及 tcp 方式.
- // Handler defines a specific action that should be taken
- // TODO: pass structured data to these actions, and document that data here.
- type Handler struct {
- // One and only one of the following should be specified.
- // Exec specifies the action to take.
- // +optional
- Exec *ExecAction `json:"exec,omitempty" protobuf:"bytes,1,opt,name=exec"`
- // HTTPGet specifies the http request to perform.
- // +optional
- HTTPGet *HTTPGetAction `json:"httpGet,omitempty" protobuf:"bytes,2,opt,name=httpGet"`
- // TCPSocket specifies an action involving a TCP port.
- // TCP hooks not yet supported
- // TODO: implement a realistic TCP lifecycle hook
- // +optional
- TCPSocket *TCPSocketAction `json:"tcpSocket,omitempty" protobuf:"bytes,3,opt,name=tcpSocket"`
- }
接下来看一下 prober.go 中的 runProbe 方法.
- func (pb *prober) runProbe(probeType probeType, p *v1.Probe, pod *v1.Pod, status v1.PodStatus, container v1.Container, containerID kubecontainer.ContainerID) (probe.Result, string, error) {
- timeout := time.Duration(p.TimeoutSeconds) * time.Second
- if p.Exec != nil {
- glog.V(4).Infof("Exec-Probe Pod: %v, Container: %v, Command: %v", pod, container, p.Exec.Command)
- command := kubecontainer.ExpandContainerCommandOnlyStatic(p.Exec.Command, container.Env)
- return pb.exec.Probe(pb.newExecInContainer(container, containerID, command, timeout))
- }
- if p.HTTPGet != nil {
- scheme := strings.ToLower(string(p.HTTPGet.Scheme))
- host := p.HTTPGet.Host
- if host == "" {
- host = status.PodIP
- }
- port, err := extractPort(p.HTTPGet.Port, container)
- if err != nil {
- return probe.Unknown, "", err
- }
- path := p.HTTPGet.Path
- glog.V(4).Infof("HTTP-Probe Host: %v://%v, Port: %v, Path: %v", scheme, host, port, path)
- url := formatURL(scheme, host, port, path)
- headers := buildHeader(p.HTTPGet.HTTPHeaders)
- glog.V(4).Infof("HTTP-Probe Headers: %v", headers)
- if probeType == liveness {
- return pb.livenessHttp.Probe(url, headers, timeout)
- } else { // readiness
- return pb.readinessHttp.Probe(url, headers, timeout)
- }
- }
- if p.TCPSocket != nil {
- port, err := extractPort(p.TCPSocket.Port, container)
- if err != nil {
- return probe.Unknown, "", err
- }
- host := p.TCPSocket.Host
- if host == "" {
- host = status.PodIP
- }
- glog.V(4).Infof("TCP-Probe Host: %v, Port: %v, Timeout: %v", host, port, timeout)
- return pb.tcp.Probe(host, port, timeout)
- }
- glog.Warningf("Failed to find probe builder for container: %v", container)
- return probe.Unknown, "", fmt.Errorf("Missing probe handler for %s:%s", format.Pod(pod), container.Name)
- }
1. 执行命令方式
通过 newExecInContainer 方法调用 CRI 执行命令:
- // ExecAction describes a "run in container" action.
- type ExecAction struct {
- // Command is the command line to execute inside the container, the working directory for the
- // command is root ('/') in the container's filesystem. The command is simply exec'd, it is
- // not run inside a shell, so traditional shell instructions ('|', etc) won't work. To use
- // a shell, you need to explicitly call out to that shell.
- // Exit status of 0 is treated as live/healthy and non-zero is unhealthy.
- // +optional
- Command []string `json:"command,omitempty" protobuf:"bytes,1,rep,name=command"`
- }
2.http GET 方式
通过 http GET 方式进行探测.
Port: 表示访问容器的端口
Host: 表示访问的主机, 默认是 Pod IP
- // HTTPGetAction describes an action based on HTTP Get requests.
- type HTTPGetAction struct {
- // Path to access on the HTTP server.
- // +optional
- Path string `json:"path,omitempty" protobuf:"bytes,1,opt,name=path"`
- // Name or number of the port to access on the container.
- // Number must be in the range 1 to 65535.
- // Name must be an IANA_SVC_NAME.
- Port intstr.IntOrString `json:"port" protobuf:"bytes,2,opt,name=port"`
- // Host name to connect to, defaults to the pod IP. You probably want to set
- // "Host" in httpHeaders instead.
- // +optional
- Host string `json:"host,omitempty" protobuf:"bytes,3,opt,name=host"`
- // Scheme to use for connecting to the host.
- // Defaults to HTTP.
- // +optional
- Scheme URIScheme `json:"scheme,omitempty" protobuf:"bytes,4,opt,name=scheme,casttype=URIScheme"`
- // Custom headers to set in the request. HTTP allows repeated headers.
- // +optional
- HTTPHeaders []HTTPHeader `json:"httpHeaders,omitempty" protobuf:"bytes,5,rep,name=httpHeaders"`
- }
3.tcp 方式
通过设置主机和端口即可进行 tcp 方式访问
- // TCPSocketAction describes an action based on opening a socket
- type TCPSocketAction struct {
- // Number or name of the port to access on the container.
- // Number must be in the range 1 to 65535.
- // Name must be an IANA_SVC_NAME.
- Port intstr.IntOrString `json:"port" protobuf:"bytes,1,opt,name=port"`
- // Optional: Host name to connect to, defaults to the pod IP.
- // +optional
- Host string `json:"host,omitempty" protobuf:"bytes,2,opt,name=host"`
- }
此处脑洞一下: 如果三种探测方式都设置了, 会如何执行处理?
思考
通过 k8s 部署生产环境应用时, 建议设置上 liveness 和 readiness, 这也是保障服务稳定性的最佳实践.
另外由于 Pod Ready 不能保证实际的业务应用 Ready 可用, 在最新的 1.14 版本中新增了一个 Pod Readiness Gates 特性 . 通过这个特性, 可以保证应用 Ready 后进而设置 Pod Ready.
结尾
针对上面的脑洞: 如果三种探测方式都设置了, 会如何执行处理?
答: 我们如果在 Pod 中设置多个探测方式, 提交配置的时候会直接报错:
此处继续源代码: 在 validation.go 中 validateHandler 中进行了限制 (也为上面 Handler struct 提到的 "One and only one of the following should be specified." 提供了事实依据)
- func validateHandler(handler *core.Handler, fldPath *field.Path) field.ErrorList {
- numHandlers := 0
- allErrors := field.ErrorList{}
- if handler.Exec != nil {
- if numHandlers> 0 {
- allErrors = append(allErrors, field.Forbidden(fldPath.Child("exec"), "may not specify more than 1 handler type"))
- } else {
- numHandlers++
- allErrors = append(allErrors, validateExecAction(handler.Exec, fldPath.Child("exec"))...)
- }
- }
- if handler.HTTPGet != nil {
- if numHandlers> 0 {
- allErrors = append(allErrors, field.Forbidden(fldPath.Child("httpGet"), "may not specify more than 1 handler type"))
- } else {
- numHandlers++
- allErrors = append(allErrors, validateHTTPGetAction(handler.HTTPGet, fldPath.Child("httpGet"))...)
- }
- }
- if handler.TCPSocket != nil {
- if numHandlers> 0 {
- allErrors = append(allErrors, field.Forbidden(fldPath.Child("tcpSocket"), "may not specify more than 1 handler type"))
- } else {
- numHandlers++
- allErrors = append(allErrors, validateTCPSocketAction(handler.TCPSocket, fldPath.Child("tcpSocket"))...)
- }
- }
- if numHandlers == 0 {
- allErrors = append(allErrors, field.Required(fldPath, "must specify a handler type"))
- }
- return allErrors
- }
来源: https://yq.aliyun.com/articles/702633