最近在做大数据监控 平台的 方案 调研 , 做了一些开源解决方案的尝试, 今天分享一下基于 Telegraf+InfluxDB+Grafana 的监控平台整体部署过程. 文章开始会简单介绍下 TICK 技术栈, 接下来就是本次方案各个组件的安装部署了. 希望对正在调研大数据监控平台或对监控系统感兴趣的同学有所帮助.
我们知道这种监控平台的数据特征一般都是时间序列数据 (简称 时序数据), 那么相应的这些数据最好是存储在时序数据库中, 目前主流的时序数据库有 InfluxDB,OpenTSDB,Graphite,TimescaleDB 等. 其中, InfluxDB 是目前监控领域使用较多的时序数据库, 并且基于 InfluxDB 有一套完善的开源解决方案 -- TICK Stack, 如下图所示:
TICK Stack 是 InfluxData 公司提供的包括采集, 存储, 展示及监控告警在内的一体化解决方案, 包含以下 4 个核心组件:
- Telegraf:Time-Series Data Collector
- InfluxDB:Time-Series Data Storage
- Chronograf:Time-Series Data Visualization
- Kapacitor:Time-Series Data Processing
今天我们选用 TICK Stack 中的 Telegraf 与 InfluxDB, 配合另一个常用的数据可视化组件 Grafana, 即前文所说的 Telegraf+InfluxDB+Grafana, 实现对我们大数据平台的基础指标 监控, 包括但不限于 CPU/Mem.NET/Disk/Diskio 等 . 接下来主要介绍下各个组件的安装部署, 请阅读下文.
一, InfluxDB
InfluxDB 是目前 IoT 监控, DevOps 监控等领域最主流的开源时序数据库, 属于 TICK Stack 的核心组件.
优点: Go 语言编写, 没有任何第三方依赖.
1 安装 influxdb
- # wget https://dl.influxdata.com/influxdb/releases/influxdb-1.7.7.x86_64.rpm
- # yum install -y influxdb-1.7.7.x86_64.rpm
2 启动 influxdb
# systemctl start influxdb
3 操作 influxdb
下面演示创建一个名为 "telegraf" 的数据库, 及名为 "telegraf" 的 普通用户,"admin" 的管理员用户:
- # influx
- Connected to http://localhost:8086 version 1.7.7
- InfluxDB shell version: 1.7.7
- > create database telegraf
- > show databases
- name: databases
- name
- ----
- _internal
- telegraf
- > create user "admin" with password 'admin' with all privileges
- > create user "telegraf" with password 'telegraf'
- > show users;
- user admin
- ---- -----
- telegraf false
- admin true
- > exit
4 查看 influxdb 配置
- # more /etc/influxdb/influxdb.conf
- ...[data]
- # The directory where the TSM storage engine stores TSM files.
- dir = "/var/lib/influxdb/data"
- # The directory where the TSM storage engine stores WAL files.
- wal-dir = "/var/lib/influxdb/wal"
- ...
二, Telegraf
Telegraf 是一个插件驱动的轻量级数据采集工具 , 用于收集系统和服务的各项指标. 支持多种输入与输出插件, 其中输入端支持直接获取操作系统的各项指标数据, 从第三方 API 获取指标数据, 甚至可以通过 statsd 和 Kafka 获取指标数据; 输出端可以将采集的指标发送到各种数据存储, 服务或消息队列中, 支持 InfluxDB,Graphite,OpenTSDB,Datadog,Librato,Kafka,MQTT 等.
优点: Go 语言编写, 没有任何第三方依赖.
1 安装 Telegraf
- # wget https://dl.influxdata.com/telegraf/releases/telegraf-1.11.2-1.x86_64.rpm
- # yum install -y telegraf-1.11.2-1.x86_64.rpm
2 配置 Telegraf, 这里修改 outputs.influxdb 的配置项
- # vi /etc/telegraf/telegraf.conf
- [[outputs.influxdb]]
- ## The full HTTP or UDP URL for your InfluxDB instance.
- ##
- ## Multiple URLs can be specified for a single cluster, only ONE of the
- ## urls will be written to each interval.
- # urls = ["unix:///var/run/influxdb.sock"]
- # urls = ["udp://127.0.0.1:8089"]
- urls = ["http://127.0.0.1:8086"]
- ## The target database for metrics; will be created as needed.
- ## For UDP url endpoint database needs to be configured on server side.
- database = "telegraf"
- ## The value of this tag will be used to determine the database. If this
- ## tag is not set the 'database' option is used as the default.
- # database_tag = ""
- ## If true, no CREATE DATABASE queries will be sent. Set to true when using
- ## Telegraf with a user without permissions to create databases or when the
- ## database already exists.
- # skip_database_creation = false
- ## Name of existing retention policy to write to. Empty string writes to
- ## the default retention policy. Only takes effect when using HTTP.
- # retention_policy = "" ## Write consistency (clusters only), can be:"any","one","quorum","all".
- ## Only takes effect when using HTTP.
- # write_consistency = "any"
- ## Timeout for HTTP messages.
- timeout = "5s"
- ## HTTP Basic Auth
- username = "telegraf"
- password = "telegraf"
3 启动 Telegraf
# systemctl start telegraf
4 查看 influxdb 数据
- Connected to http://localhost:8086 version 1.7.7
- InfluxDB shell version: 1.7.7
- > SELECT * FROM "cpu" limit 10
- name: CPU
- time CPU host usage_guest usage_guest_nice usage_idle usage_iowait usage_irq usage_nice usage_softirq usage_steal usage_system usage_user
- ---- --- ---- ----------- ---------------- ---------- ------------ --------- ---------- ------------- ----------- ------------ ----------
- 1563430490000000000 CPU-total ali-rds-kafka.novalocal 0 0 98.08294699768652 0 0 0 0 0 0.17541661445337134 1.7416363863649844
- 1563430490000000000 cpu0 ali-rds-kafka.novalocal 0 0 98.19819820155767 0 0 0 0 0 0.2002002001582113 1.6016016012656904
- 1563430490000000000 cpu1 ali-rds-kafka.novalocal 0 0 92.18436872588022 0 0 0 0 0 0.20040080159860416 7.6152304605829215
- 1563430490000000000 cpu2 ali-rds-kafka.novalocal 0 0 98.99598392124761 0 0 0 0.10040160637911746 0 0.30120481914398695 0.602409638269711
- 1563430490000000000 cpu3 ali-rds-kafka.novalocal 0 0 99.29789367823233 0 0 0 0 0 0.10030090268482908 0.6018054160907298
- 1563430490000000000 cpu4 ali-rds-kafka.novalocal 0 0 99.29789367796998 0 0 0 0 0 0.1003009027223065 0.6018054163155944
- 1563430490000000000 cpu5 ali-rds-kafka.novalocal 0 0 98.99899898391868 0 0 0 0 0 0.20020020023286633 0.8008008009314653
- 1563430490000000000 cpu6 ali-rds-kafka.novalocal 0 0 99.09909910044288 0 0 0 0 0 0.20020020023741836 0.7007007008127561
- 1563430490000000000 cpu7 ali-rds-kafka.novalocal 0 0 98.4969940029743 0 0 0 0 0 0.30060120238879307 1.2024048095642854
- 1563430500000000000 CPU-total ali-rds-kafka.novalocal 0 0 99.54954956886654 0 0 0 0.01251251251458971 0 0.10010010011870918 0.33783783789836747
注意: influxdb 自 1.2 版本之后关闭了自带的 web 界面, 安装之前的方式访问 Web 界面将会报 "404 page not found", 如果想用 Web 界面访问 influxdb, 建议使用第三方工具, 或者使用低版本 influxdb 的 Web 界面访问.
三, Grafana
Grafana 是目前比较流行的开源可视化组件, 支持多种数据源, 包括 InfluxDB,OpenTSDB,Graphite,Prometheus,Elasticsearch 等主流的时序数据库, 以及 MySQL,PostgreSQL 等关系数据库等.
优点: Go 语言编写, 自带用户管理, 告警等功能.
1 安装 Grafana
- # wget https://dl.grafana.com/oss/release/grafana-6.2.5-1.x86_64.rpm
- # yum install -y grafana-6.2.5-1.x86_64.rpm
2 启动 Grafana
# systemctl start grafana-server
3 访问 Grafana
Grafana 的默认 http 端口为 3000, 默认管理员用户密码为 admin/admin, 因此访问 Grafana 只需访问 http://IP:3000 即可, 初始访问的时候会提示修改密码. 首页如下:
4 查看 Grafana 配置
- # more /etc/grafana/grafana.INI
- ...
- [paths]
- # Path to where grafana can store temp files, sessions, and the sqlite3 db (if that is used)
- ;data = /var/lib/grafana
- # Temporary files in `data` directory older than given duration will be removed
- ;temp_data_lifetime = 24h
- # Directory where grafana can store logs
- ;logs = /var/log/grafana
- ...
- # The http port to use
- ;http_port = 3000
- ...
5 界面配置 Grafana 访问 influxdb
进入 Grafana 界面后, 首先是添加数据源: Data Sources --> Add data source , 这里选择 influxdb 作为数据源; 然后是 新建可视化面板: Dashboards --> Manage --> New dashboard, 简单配置展示项后数据就可以展示出来了. 页面操作比较简单 , 具体细节不多赘述, 自行进一步熟悉 Grafana 界面即可.
至此, 我们演示了相关组件的安装部署与基本使用, 成功展示了采集的指标数据. 本文介绍了 TICK Stack, 以及基于 Telegraf+InfluxDB+Grafana 的监控平台搭建. 后续更多的大数据监控运维分享请关注本公众号.
-- END --
来源: http://www.tuicool.com/articles/imaaYze