悟冥 2019-05-14 22:07:34 浏览 125 评论 0
函数
日志
日志服务
- SLS
- LOG
- aliyun
- BY
- Group
- pattern
摘要: 围绕日志, 挖掘其中更大价值, 一直是我们团队所关注. 在原有日志实时查询基础上, 今年 SLS 在 DevOps 领域完善了如下功能: - 上下文查询 - 实时 Tail 和智能聚类, 以提高问题调查效率 - 提供多种时序数据的异常检测和预测函数, 来做更智能的检查和预测 - 数据分析的结果可视化 - 强大的告...
0. 文章系列链接
SLS 机器学习介绍(01): 时序统计建模
SLS 机器学习介绍(02): 时序聚类建模
SLS 机器学习介绍(03): 时序异常检测建模
SLS 机器学习介绍(04): 规则模式挖掘
SLS 机器学习介绍(05): 时间序列预测
一眼看尽上亿日志 - SLS 智能聚类 (LogReduce) 发布 https://www.atatech.org/articles/125117
SLS 机器学习最佳实战: 时序异常检测和报警
SLS 机器学习最佳实战: 时序预测
1. 手中的锤子都有啥?
围绕日志, 挖掘其中更大价值, 一直是我们团队所关注. 在原有日志实时查询基础上, 今年 SLS 在 DevOps 领域完善了如下功能:
上下文查询
实时 Tail 和智能聚类, 以提高问题调查效率
提供多种时序数据的异常检测和预测函数, 来做更智能的检查和预测
数据分析的结果可视化
强大的告警设置和通知, 通过调用 webhook 进行关联行动
今天我们重点介绍下, 日志只能聚类和异常告警如何配合, 更好的进行异常发现和告警
2. 平台实验
2.1 实验数据
一份 Sys Log 的原始数据,, 并且开启了日志聚类服务, 具体的状态截图如下:
通过调整下面截图中红色框 1 的大小, 可以改变图中红色框 2 的结果, 但是对于每个最细粒度的 pattern 并不会改变, 也就是说: 子 Pattern 的结果是稳定且唯一的, 我们可以通过子 Pattern 的 Signature 找到对应的原始日志条目.
2.2 生成子模式的时序信息
假设, 我们对这个子 Pattern 要进行监控:
msg:vm-111932.tc su: pam_unix(*:session): session closed for user root
对应的 signature_id : __log_signature__: 1814836459146662485
我们得到了上述 pattern 对应的原始日志, 可以看下具体的数量在时间轴上的直返图:
上图中, 我们可以发现, 这个模式的日志分布不是很均衡, 其中还有一些是没有的, 如果直接按照时间窗口统计数量, 得到的时序图如下:
- __log_signature__: 1814836459146662485 |
- select
- date_trunc('minute', __time__) as time,
- COUNT(*) as num
- from log GROUP BY time order by time ASC limit 10000
上述图中我们发现时间上并不是连续的. 因此, 我们需要对这条时序进行补点操作.
- __log_signature__: 1814836459146662485 |
- select
- time_series(time, '1m', '%Y-%m-%d %H:%i:%s', '0') as time,
- avg(num) as num
- from (
- select
- __time__ - __time__ % 60 as time,
- COUNT(*) as num
- from log GROUP BY time order by time desc )
- GROUP by time order by time ASC limit 10000
2.3 对时序进行异常检测
使用时序异常检测函数: ts_predicate_arma
- __log_signature__: 1814836459146662485 |
- select
- ts_predicate_arma(to_unixtime(time), num, 5, 1, 1, 1, 'avg')
- from (
- select
- time_series(time, '1m', '%Y-%m-%d %H:%i:%s', '0') as time,
- avg(num) as num
- from (
- select
- __time__ - __time__ % 60 as time,
- COUNT(*) as num
- from log GROUP BY time order by time desc )
- GROUP by time order by time ASC ) limit 10000
2.4 告警该如何设置
将机器学习函数的结果拆解开
- __log_signature__: 1814836459146662485 |
- select
- t1[1] as unixtime, t1[2] as src, t1[3] as pred, t1[4] as up, t1[5] as lower, t1[6] as prob
- from (
- select
- ts_predicate_arma(to_unixtime(time), num, 5, 1, 1, 1, 'avg') as res
- from (
- select
- time_series(time, '1m', '%Y-%m-%d %H:%i:%s', '0') as time,
- avg(num) as num
- from (
- select
- __time__ - __time__ % 60 as time,
- COUNT(*) as num
- from log GROUP BY time order by time desc )
- GROUP by time order by time ASC )) , unnest(res) as t(t1)
针对最近两分钟的结果进行告警
- __log_signature__: 1814836459146662485 |
- select
- unixtime, src, pred, up, lower, prob
- from (
- select
- t1[1] as unixtime, t1[2] as src, t1[3] as pred, t1[4] as up, t1[5] as lower, t1[6] as prob
- from (
- select
- ts_predicate_arma(to_unixtime(time), num, 5, 1, 1, 1, 'avg') as res
- from (
- select
- time_series(time, '1m', '%Y-%m-%d %H:%i:%s', '0') as time,
- avg(num) as num
- from (
- select
- __time__ - __time__ % 60 as time, COUNT(*) as num
- from log GROUP BY time order by time desc )
- GROUP by time order by time ASC )) , unnest(res) as t(t1) )
- where is_nan(src) = false order by unixtime desc limit 2
针对上升点进行告警, 并设置兜底策略
- __log_signature__: 1814836459146662485 |
- select
- sum(prob) as sumProb, max(src) as srcMax, max(up) as upMax
- from (
- select
- unixtime, src, pred, up, lower, prob
- from (
- select
- t1[1] as unixtime, t1[2] as src, t1[3] as pred, t1[4] as up, t1[5] as lower, t1[6] as prob
- from (
- select
- ts_predicate_arma(to_unixtime(time), num, 5, 1, 1, 1, 'avg') as res
- from (
- select
- time_series(time, '1m', '%Y-%m-%d %H:%i:%s', '0') as time, avg(num) as num
- from (
- select
- __time__ - __time__ % 60 as time, COUNT(*) as num
- from log GROUP BY time order by time desc )
- GROUP by time order by time ASC )) , unnest(res) as t(t1) )
- where is_nan(src) = false order by unixtime desc limit 2 )
具体的告警设置如下:
3. 硬广时间
3.1 日志进阶
这里是日志服务的各种功能的演示 日志服务整体介绍, 各种 Demo
更多日志进阶内容可以参考: 日志服务学习路径.
3.2 联系我们
纠错或者帮助文档以及最佳实践贡献, 请联系: 悟冥
问题咨询请加钉钉群:
来源: https://yq.aliyun.com/articles/702456