基于 GPU 的指标扩缩容
在深度学习训练中, 训练完成的模型, 通过 Serving 服务提供模型服务. 本文介绍如何构建弹性自动伸缩的 Serving 服务.
Kubernetes 支持 HPA 模块进行容器伸缩, 默认支持 CPU 和内存等指标. 原生的 HPA 基于 Heapster, 不支持 GPU 指标的伸缩, 但是支持通过 CustomMetrics 的方式进行 HPA 指标的扩展. 我们可以通过部署一个基于 Prometheus Adapter 作为 CustomMetricServer, 它能将 Prometheus 指标注册的 APIServer 接口, 提供 HPA 调用. 通过配置, HPA 将 CustomMetric 作为扩缩容指标, 可以进行 GPU 指标的弹性伸缩.
前提
您需要创建一个容器服务 Kubernets 集群, 并完成 GPU 监控部分的部署 阿里云容器 Kubernetes 监控 - GPU 监控, 完成部署 Promethues 用于监控 GPU 使用指标, 我们将通过 Prometheus 里的监控数据作为参考指标进行弹性伸缩.
注意
当 HPA 配置自定义监控指标进行伸缩指标后, 将无法使用原生 HPA 基于 Heapster 的 CPU 和 Memory 的伸缩.
部署
登录 master 上执行脚本, 生成 Prometheus Adapter 的证书
- #!/usr/bin/env bash
- set -e
- set -o pipefail
- set -u
- b64_opts='--wrap=0'
- # go get -v -u GitHub.com/cloudflare/cfssl/cmd/...
- export PURPOSE=metrics
- openssl req -x509 -sha256 -new -nodes -days 365 -newkey rsa:2048 -keyout ${PURPOSE}-ca.key -out ${PURPOSE}-ca.crt -subj "/CN=ca"
- echo '{"signing":{"default":{"expiry":"43800h","usages":["signing","key encipherment","'${PURPOSE}'"]}}}'> "${PURPOSE}-ca-config.json"
- export SERVICE_NAME=custom-metrics-apiserver
- export ALT_NAMES='"custom-metrics-apiserver.monitoring","custom-metrics-apiserver.monitoring.svc"'
- echo "{\"CN\":\"${SERVICE_NAME}\", \"hosts\": [${ALT_NAMES}], \"key\": {\"algo\": \"rsa\",\"size\": 2048}}" | \
- cfssl gencert -ca=metrics-ca.crt -ca-key=metrics-ca.key -config=metrics-ca-config.JSON - | cfssljson -bare apiserver
- cat <<-EOF> cm-adapter-serving-certs.YAML
- apiVersion: v1
- kind: Secret
- metadata:
- name: cm-adapter-serving-certs
- data:
- serving.crt: $(base64 ${b64_opts} <apiserver.pem)
- serving.key: $(base64 ${b64_opts} < apiserver-key.pem)
- EOF
- kubectl -n kube-system apply -f cm-adapter-serving-certs.YAML
部署 Prometheus CustomMetric Adapter
- apiVersion: apps/v1
- kind: Deployment
- metadata:
- labels:
- App: custom-metrics-apiserver
- name: custom-metrics-apiserver
- spec:
- replicas: 1
- selector:
- matchLabels:
- App: custom-metrics-apiserver
- template:
- metadata:
- labels:
- App: custom-metrics-apiserver
- name: custom-metrics-apiserver
- spec:
- serviceAccountName: custom-metrics-apiserver
- containers:
- - name: custom-metrics-apiserver
- image: registry.cn-beijing.aliyuncs.com/test-hub/k8s-prometheus-adapter-amd64
- args:
- - --secure-port=6443
- - --tls-cert-file=/var/run/serving-cert/serving.crt
- - --tls-private-key-file=/var/run/serving-cert/serving.key
- - --logtostderr=true
- - --prometheus-url=http://prometheus-svc.kube-system.svc.cluster.local:9090/
- - --metrics-relist-interval=1m
- - --v=10
- - --config=/etc/adapter/config.YAML
- ports:
- - containerPort: 6443
- volumeMounts:
- - mountPath: /var/run/serving-cert
- name: volume-serving-cert
- readOnly: true
- - mountPath: /etc/adapter/
- name: config
- readOnly: true
- - mountPath: /tmp
- name: tmp-vol
- volumes:
- - name: volume-serving-cert
- secret:
- secretName: cm-adapter-serving-certs
- - name: config
- configMap:
- name: adapter-config
- - name: tmp-vol
- emptyDir: {}
- ---
- kind: ServiceAccount
- apiVersion: v1
- metadata:
- name: custom-metrics-apiserver
- ---
- apiVersion: v1
- kind: Service
- metadata:
- name: custom-metrics-apiserver
- spec:
- ports:
- - port: 443
- targetPort: 6443
- selector:
- App: custom-metrics-apiserver
- ---
- apiVersion: rbac.authorization.k8s.io/v1
- kind: ClusterRole
- metadata:
- name: custom-metrics-server-resources
- rules:
- - apiGroups:
- - custom.metrics.k8s.io
- resources: ["*"]
- verbs: ["*"]
- ---
- apiVersion: v1
- kind: ConfigMap
- metadata:
- name: adapter-config
- data:
- config.YAML: |
- rules:
- - seriesQuery: '{uuid!=""}'
- resources:
- overrides:
- node_name: {resource: "node"}
- pod_name: {resource: "pod"}
- namespace_name: {resource: "namespace"}
- name:
- matches: ^nvidia_gpu_(.*)$
- as: "${1}_over_time"
- metricsQuery: ceil(avg_over_time(<<.Series>>{<<.LabelMatchers>>}[3m]))
- - seriesQuery: '{uuid!=""}'
- resources:
- overrides:
- node_name: {resource: "node"}
- pod_name: {resource: "pod"}
- namespace_name: {resource: "namespace"}
- name:
- matches: ^nvidia_gpu_(.*)$
- as: "${1}_current"
- metricsQuery: <<.Series>>{<<.LabelMatchers>>}
- ---
- apiVersion: rbac.authorization.k8s.io/v1
- kind: ClusterRole
- metadata:
- name: custom-metrics-resource-reader
- rules:
- - apiGroups:
- - ""
- resources:
- - namespaces
- - pods
- - services
- verbs:
- - get
- - list
- ---
- apiVersion: rbac.authorization.k8s.io/v1
- kind: ClusterRoleBinding
- metadata:
- name: hpa-controller-custom-metrics
- roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: ClusterRole
- name: custom-metrics-server-resources
- subjects:
- - kind: ServiceAccount
- name: horizontal-pod-autoscaler
- namespace: kube-system
角色授权, 如果使用 custom-metric 以外的命名空间, 需要修改模板中的 namespace 字段:
- apiVersion: apiregistration.k8s.io/v1beta1
- kind: APIService
- metadata:
- name: v1beta1.custom.metrics.k8s.io
- namespace: kube-system
- spec:
- service:
- name: custom-metrics-apiserver
- namespace: kube-system # 如果部署 custom-metric 以外的 Namespace 需要修改此处
- group: custom.metrics.k8s.io
- version: v1beta1
- insecureSkipTLSVerify: true
- groupPriorityMinimum: 100
- versionPriority: 100
- ---
- apiVersion: rbac.authorization.k8s.io/v1
- kind: ClusterRoleBinding
- metadata:
- name: custom-metrics-resource-reader
- roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: ClusterRole
- name: custom-metrics-resource-reader
- subjects:
- - kind: ServiceAccount
- name: custom-metrics-apiserver
- namespace: kube-system # 如果部署 custom-metric 以外的 Namespace 需要修改此处
- ---
- apiVersion: rbac.authorization.k8s.io/v1
- kind: ClusterRoleBinding
- metadata:
- name: custom-metrics:system:auth-delegator
- roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: ClusterRole
- name: system:auth-delegator
- subjects:
- - kind: ServiceAccount
- name: custom-metrics-apiserver
- namespace: kube-system # 如果部署 custom-metric 以外的 Namespace 需要修改此处
- ---
- apiVersion: rbac.authorization.k8s.io/v1
- kind: RoleBinding
- metadata:
- name: custom-metrics-auth-reader
- namespace: kube-system
- roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: Role
- name: extension-apiserver-authentication-reader
- subjects:
- - kind: ServiceAccount
- name: custom-metrics-apiserver
- namespace: kube-system
部署完成后, 可以通过 customMetric 的 ApiServer 调用, 验证 Prometheus Adapter 部署成功
- # kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/temperature_celsius_current"
- {
- "kind":"MetricValueList","apiVersion":"custom.metrics.k8s.io/v1beta1","metadata":{
- "selfLink":"/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/temperature_celsius_current"
- },"items":[]
- }
修改 controller-manager 配置, 使用 CustomMetric 作为 hpa 伸缩指标
登录到三个 master 上, 分别执行脚本, 修改 ApiServer 的 HPA 配置
sed -i 's/--horizontal-pod-autoscaler-use-rest-clients=false/--horizontal-pod-autoscaler-use-rest-clients=true/g' /etc/kubernetes/manifests/kube-controller-manager.YAML
检测修改结果
- # kubectl -n kube-system describe po -l component=kube-controller-manager | grep 'horizontal-pod-autoscaler-use-rest-clients'
- --horizontal-pod-autoscaler-use-REST-clients=true
- --horizontal-pod-autoscaler-use-REST-clients=true
- --horizontal-pod-autoscaler-use-REST-clients=true
伸缩指标
至此, 我们已经部署了一个 Prometheus 的 CustomMetric Server, 我们通过 adapter-config 这个 configMap 配置 Prometheus 提供暴露给 ApiServer 的指标
支持以下 GPU 指标:
Prometheus 指标 | 含义 | HPA 指标 | HPA 指标 (3 分钟平均值) |
---|---|---|---|
nvidia_gpu_duty_cycle | GPU 使用率 | nvidia_gpu_duty_cycle_current | nvidia_gpu_duty_cycle_over_time |
nvidia_gpu_memory_total_bytes | GPU 总内存 | nvidia_gpu_memory_total_bytes_current | nvidia_gpu_memory_total_bytes_over_time |
nvidia_gpu_memory_used_bytes | GPU 已分配内存 | nvidia_gpu_memory_used_bytes_current | nvidia_gpu_memory_used_bytes_over_time |
nvidia_gpu_power_usage_milliwatts | GPU 耗电量 | nvidia_gpu_power_usage_milliwatts_current | nvidia_gpu_power_usage_milliwatts_over_time |
nvidia_gpu_temperature_celsius | GPU 温度 | temperature_celsius_current | temperature_celsius_over_time |
使用 GPU 指标进行自动伸缩
部署一个 deployment
- apiVersion: v1
- kind: Service
- metadata:
- name: fast-style-transfer-serving
- labels:
- App: tensorflow-serving
- spec:
- ports:
- - name: http-serving
- port: 5000
- targetPort: 5000
- selector:
- App: tensorflow-serving
- ---
- apiVersion: extensions/v1beta1
- kind: Deployment
- metadata:
- name: fast-style-transfer-serving
- labels:
- App: tensorflow-serving
- spec:
- replicas: 1
- template:
- metadata:
- labels:
- App: tensorflow-serving
- spec:
- containers:
- - name: serving
- image: "registry.cn-hangzhou.aliyuncs.com/tensorflow-samples/fast-style-transfer-serving:la_muse"
- command: ["python", "app.py"]
- resources:
- limits:
- nvidia.com/gpu: 1
创建一个基于 GPU 指标伸缩的 HPA
- kind: HorizontalPodAutoscaler
- apiVersion: autoscaling/v2beta1
- metadata:
- name: gpu-hpa
- spec:
- scaleTargetRef:
- apiVersion: extensions/v1beta1
- kind: Deployment
- name: fast-style-transfer-serving
- minReplicas: 1
- maxReplicas: 10
- metrics:
- - type: Pods
- pods:
- metricName: duty_cycle_current # 指标为 pod 的平均 GPU 使用率
- targetAverageValue: 40
查看 HPA 的指标以及指标值
- # kubectl get hpa
- NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
- gpu-hpa Deployment/fast-style-transfer-serving 0 / 40 1 10 1 37s
部署一个 fast-style-transfer 的压测应用
这个应用会不断向 serving 发送图片, 用于模拟压力测试
- apiVersion: extensions/v1beta1
- kind: Deployment
- metadata:
- name: fast-style-transfer-press
- labels:
- App: fast-style-transfer-press
- spec:
- replicas: 1
- template:
- metadata:
- labels:
- App: fast-style-transfer-press
- spec:
- containers:
- - name: serving
- image: "registry.cn-hangzhou.aliyuncs.com/xiaozhou/fast-style-transfer-press:v0"
- env:
- - name: SERVER_IP
- value: fast-style-transfer-serving
- - name: BATCH_SIZE
- value: "100"
- - name: TOTAL_SIZE
- value: "12000"
压测部署完成后, 可以在监控面板的 [GPU 应用监控] 看到指标变化
也能够通过 HPA 看到指标变化
- # kubectl get hpa
- NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
- sample-gpu-hpa Deployment/demo-service 63 / 30 1 10 1 3m
压测一段时间后可以看到 pod 扩容
- NAME READY STATUS RESTARTS AGE
- fast-style-transfer-press-69c48966d8-dqf5n 1/1 Running 0 4m
- fast-style-transfer-serving-84587c94b7-7xp2d 1/1 Running 0 5m
- fast-style-transfer-serving-84587c94b7-slbdn 1/1 Running 0 47s
监控界面也可以看到扩容的的 pod 以及 GPU 指标:
将压测容器停止
执行以下命令, 将压测应用停止:
- kubectl scale deploy fast-style-transfer-press --replicas=0 # 将压测应用容器缩容为 0
- (也可以在控制台上执行部署伸缩操作)
在 HPA 上检查 dutyCycle 指标变化为 0
- kubectl get hpa
- NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
- gpu-hpa Deployment/fast-style-transfer-serving 0 / 40 1 10 3 9m
一段时间后检查容器是否成功缩容
- kubectl get po
- NAME READY STATUS RESTARTS AGE
- fast-style-transfer-serving-84587c94b7-7xp2d 1/1 Running 0 10m
来源: https://yq.aliyun.com/articles/655145