STL (Seasonal-Trend decomposition procedure based on Loess) [1] 为时序分解中一种常见的算法,基于 LOESS 将某时刻的数据 \(Y_v\) 分解为趋势分量(trend component)、周期分量(seasonal component)和余项(remainder component):
\[
Y_v = T _v + S_v + R_v \quad v= 1, \cdots, N
\]
STL 分为内循环(inner loop)与外循环(outer loop),其中内循环主要做了趋势拟合与周期分量的计算。假定 \(T_v^{(k)}\)、\(S_v{(k)}\) 为内循环中第 k-1 次 pass 结束时的趋势分量、周期分量,初始时 \(T_v^{(k)} = 0\);并有以下参数:
每个周期相同位置的样本点组成一个子序列(subseries),容易知道这样的子序列共有共有 \(n_(p)\) 个,我们称其为 cycle-subseries。内循环主要分为以下 6 个步骤:
外层循环主要用于调节 robustness weight。如果数据序列中有 outlier,则余项会较大。定义
\[h = 6 * median(|R_v|)
\]
对于位置为 \(v\) 的数据点,其 robustness weight 为
\[\rho_v = B(|R_v|/h)
\]
其中 \(B\) 函数为 bisquare 函数:
\[B(u) = \left \{
{\matrix {{(1-u^2)^2 } & {for \quad 0 \le u < 1} \cr
{0} & {for \quad u \ge 1} \cr
}
}
\right.
\]
然后每一次迭代的内循环中,在 Step 2 与 Step 6 中做 LOESS 回归时,邻域权重(neighborhood weight)需要乘以 \(\rho_v\),以减少 outlier 对回归的影响。STL 的具体流程如下:
- outer loop:
- 计算robustness weight;
- inner loop:
- Step 1 去趋势;
- Step 2 周期子序列平滑;
- Step 3 周期子序列的低通量过滤;
- Step 4 去除平滑周期子序列趋势;
- Step 5 去周期;
- Step 6 趋势平滑;
为了使得算法具有足够的 robustness,所以设计了内循环与外循环。特别地,当 \(n_(i)\) 足够大时,内循环结束时趋势分量与周期分量已收敛;若时序数据中没有明显的 outlier,可以将 \(n_(o)\) 设为 0。
R 提供 STL 函数,底层为作者 Cleveland 的 Fortran 实现。Python 的 statsmodels 实现了一个简单版的时序分解,通过加权滑动平均提取趋势分量,然后对 cycle-subseries 每个时间点数据求平均组成周期分量:
- def seasonal_decompose(x, model="additive", filt=None, freq=None, two_sided=True):
- _pandas_wrapper, pfreq = _maybe_get_pandas_wrapper_freq(x)
- x = np.asanyarray(x).squeeze()
- nobs = len(x)
- ...
- if filt is None:
- if freq % 2 == 0: # split weights at ends
- filt = np.array([.5] + [1] * (freq - 1) + [.5]) / freq
- else:
- filt = np.repeat(1./freq, freq)
- nsides = int(two_sided) + 1
- # Linear filtering via convolution. Centered and backward displaced moving weighted average.
- trend = convolution_filter(x, filt, nsides)
- if model.startswith('m'):
- detrended = x / trend
- else:
- detrended = x - trend
- period_averages = seasonal_mean(detrended, freq)
- if model.startswith('m'):
- period_averages /= np.mean(period_averages)
- else:
- period_averages -= np.mean(period_averages)
- seasonal = np.tile(period_averages, nobs // freq + 1)[:nobs]
- if model.startswith('m'):
- resid = x / seasonal / trend
- else:
- resid = detrended - seasonal
- results = lmap(_pandas_wrapper, [seasonal, trend, resid, x])
- return DecomposeResult(seasonal=results[0], trend=results[1],
- resid=results[2], observed=results[3])
R 版 STL 分解带噪音点数据的结果如下图:
- data = read.csv("artificialWithAnomaly/art_daily_flatmiddle.csv")
- View(data)
- data_decomp <- stl(ts(data[[2]], frequency = 1440/5), s.window = "periodic", robust = TRUE)
- plot(data_decomp)
statsmodels 模块的时序分解的结果如下图:
- import statsmodels.api as sm
- import matplotlib.pyplot as plt
- import pandas as pd
- from date_utils import get_gran, format_timestamp
- dta = pd.read_csv('artificialWithAnomaly/art_daily_flatmiddle.csv',
- usecols=['timestamp', 'value'])
- dta = format_timestamp(dta)
- dta = dta.set_index('timestamp')
- dta['value'] = dta['value'].apply(pd.to_numeric, errors='ignore')
- dta.value.interpolate(inplace=True)
- res = sm.tsa.seasonal_decompose(dta.value, freq=288)
- res.plot()
- plt.show()
[1] Cleveland, Robert B., William S. Cleveland, and Irma Terpenning. "STL: A seasonal-trend decomposition procedure based on loess." Journal of Official Statistics 6.1 (1990): 3.
来源: http://www.cnblogs.com/en-heng/p/7390310.html