开源监控系统 Prometheus 介绍

前言

Prometheus https://github.com/prometheus/prometheus 是 CNCF 的一个开源项目, Google BorgMon 监控系统的开源版本, 是一个系统和服务的监控系统. 周期性采集 metrics 指标, 匹配规则和展示结果, 以及触发某些条件的告警发送.

特点

Prometheus 主要区别于其他监控系统的特点是:

多维度数据模型 (时序数据是由指标名字和 kv 结构的维度定义)

灵活的查询语言 (PromQL)

不依赖分布式存储. 每个 server 是一个自治的节点.

通过 HTTP 拉取收集时序数据, 同时提供 push gateway 供用户主动推送数据, 主要用于短生命周期的 job.

通过静态配置或服务发现来发现目标对象

支持多种多样的出图和展示方式, 例如自带的 web UI 和 Grafana 等.

支持水平扩容

架构

组件

Prometheus 生态系统由多个组件组成, 其中大部分是可选的组件.

Prometheus Server 负责收集和存储时序数据. 提供 PromQL 查询语言的支持.

Pushgateway 支持短生命周期的任务推送结果数据.

Exporter 采集组件的总称, 是 Prometheus 生态系统中的 Agent.

Altermanager 处理告警.

客户端 SDK 官方提供的 SDK 支持的语言由 go,java,python 等多种语言.

绝大部分 Prometheus 的组件都是用 golang 编写, 使得 Prometheus 组件容易编译和部署.(二进制没有依赖)

工作流程

从架构图中可以看出, Prometheus Server 周期性的拉取从配置文件或者服务发现获取到的目标数据, 每个目标需要通过 HTTP 接口暴露数据. Prometheus Server 通过一定的规则汇总和记录时序数据到本地数据库. 将符合检测条件的告警数据推送给 Altermanager,Altermanager 通过配置的通知方式发送告警. Web UI 或者 Grafana 通过 PromQL 查询 Prometheus Server 中的数据绘图展示.

适用的场景

Prometheus 在记录纯数字的时序数据方面表现得非常好. 既适用于机器的性能数据, 也适用于服务的监控数据. 对于微服务, Prometheus 的多维度收集和查询语言也是非常强大.

不适用的场景

Promethus 的价值在于它的可靠性. Prometheus 不适用于对统计或分析数据 100% 准确要求的场景.

部署实战

下面我会通过 Docker Compose 的方式部署整个 Prometheus 监控系统和 Grafana 展示数据. 如果对 Docker Compose 还不熟悉的朋友, 可以先查看我之前的介绍文章.

Prometheus 的 docker-compose.YAML 基于 GitHub 的开源仓库 https://github.com/vegasbrianc/prometheus 修改. docker-compose.YAML 内容如下:

version: '3.1'
volumes:
    prometheus_data: {}
    grafana_data: {}
services:
  prometheus:
    image: prom/prometheus:v2.1.0
    volumes:
      - ./prometheus/:/etc/prometheus/
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/usr/share/prometheus/console_libraries'
      - '--web.console.templates=/usr/share/prometheus/consoles'
    ports:
      - 9090:9090
    restart: always
  node-exporter:
    image: prom/node-exporter
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - --collector.filesystem.ignored-mount-points
      - "^/(sys|proc|dev|host|etc|rootfs/var/lib/docker/containers|rootfs/var/lib/docker/overlay2|rootfs/run/docker/netns|rootfs/var/lib/docker/aufs)($$|/)"
    ports:
      - 9100:9100
    restart: always
  alertmanager:
    image: prom/alertmanager
    volumes:
      - ./alertmanager/:/etc/alertmanager/
    ports:
      - 9093:9093
    restart: always
    command:
      - '--config.file=/etc/alertmanager/config.yml'
      - '--storage.path=/alertmanager'
  grafana:
    image: grafana/grafana
    user: "104"
    ports:
      - 3000:3000
    depends_on:
      - prometheus
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/provisioning/:/etc/grafana/provisioning/
    env_file:
      - ./grafana/config.monitoring
    restart: always

从上面的 docker-compose.YAML 可以看出, 将通过 Docker Compose 部署 Prometheus Server,Altermanager,Grafana, 和 node exporter. 其中 node exporter 负责采集机器的基础性能数据, 例如 CPU,MEM,DISK 等等, 通过暴露 HTTP 接口供 Prometheus Server 拉取数据做数据存储和清洗. Grafana 负责数据的展示. Prometheus 通过配置文件静态配置获取 node exporter 的地址:

$ cat prometheus.YAML
 # my global config
 global:
   scrape_interval:     15s # By default, scrape targets every 15 seconds.
   evaluation_interval: 15s # By default, scrape targets every 15 seconds.
   # scrape_timeout is set to the global default (10s).
   # Attach these labels to any time series or alerts when communicating with
   # external systems (federation, remote storage, Alertmanager).
   external_labels:
       monitor: 'my-project'
 # Load and evaluate rules in this file every 'evaluation_interval' seconds.
 rule_files:
   - 'alert.rules'
   # - "first.rules"
   # - "second.rules"
 # alert
 alerting:
   alertmanagers:
   - scheme: http
     static_configs:
     - targets:
       - "alertmanager:9093"
 # A scrape configuration containing exactly one endpoint to scrape:
 # Here it's Prometheus itself.
 scrape_configs:
   # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
   - job_name: 'prometheus'
     # Override the global default and scrape targets from this job every 5 seconds.
     scrape_interval: 5s
     static_configs:
          - targets: ['localhost:9090']
   - job_name: 'node-exporter'
     # Override the global default and scrape targets from this job every 5 seconds.
     scrape_interval: 5s
     static_configs:
          - targets: ['node-exporter:9100']

其中 40-45 行是 node-exporter 的抓取地址和周期配置. 因为 Docker Compose 会自动做服务地址解析, 所以这里可以直接用 node-exporter:9100 作为地址.

通过 Prometheus 9090 端口可以查看到要采集的目标列表信息:

通过 Grafana 可以查看到 node exporter 采集上来的数据展示, 其中 Grafana 用的看板模板是 https://grafana.com/dashboards/8919

总结

文章开始分析了 Prometheus 开源监控系统的整体架构和特点, 然后通过 Docker Compose 演示了整个系统的搭建. 下一篇博客我将演示用 Prometheus 提供的 Golang SDK 从头开始写一个 Expoter, 敬请期待.

参考

https://prometheus.io/docs/introduction/overview/

来源: https://www.cnblogs.com/makelu/p/11069094.html

与本文相关文章

暂无,快来抢沙发吧！