时间序列分解算法：STL

1. 详解

STL (Seasonal-Trend decomposition procedure based on Loess) [1] 为时序分解中一种常见的算法，基于 LOESS 将某时刻的数据 $Y_v$ 分解为趋势分量（trend component）、周期分量（seasonal component）和余项（remainder component）:
\[ Y_v = T _v + S_v + R_v \quad v= 1, \cdots, N \]

STL 分为内循环（inner loop）与外循环（outer loop），其中内循环主要做了趋势拟合与周期分量的计算。假定 $T_v^{(k)}$、$S_v{(k)}$ 为内循环中第 k-1 次 pass 结束时的趋势分量、周期分量，初始时 $T_v^{(k)} = 0$；并有以下参数：

$n_(i)$ 内层循环数，
$n_(o)$ 外层循环数，
$n_(p)$ 为一个周期的样本数，
$n_(s)$ 为 Step 2 中 LOESS 平滑参数，
$n_(l)$ 为 Step 3 中 LOESS 平滑参数，
$n_(t)$ 为 Step 6 中 LOESS 平滑参数。

每个周期相同位置的样本点组成一个子序列（subseries），容易知道这样的子序列共有共有 $n_(p)$ 个，我们称其为 cycle-subseries。内循环主要分为以下 6 个步骤：

Step 1: 去趋势（Detrending），减去上一轮结果的趋势分量，$Y_v - T_v^{(k)}$；
Step 2: 周期子序列平滑（Cycle-subseries smoothing），用 LOESS ($q=n_{n(s)}$,$d=1$) 对每个子序列做回归，并向前向后各延展一个周期；平滑结果组成 temporary seasonal series，记为 $C_v^{(k+1)}, \quad v = -n_{(p)} + 1, \cdots, -N + n_{(p)} $；
Step 3: 周期子序列的低通量过滤（Low-Pass Filtering），对上一个步骤的结果序列 $C_v^{(k+1)}$ 依次做长度为 $n_(p)$、$n_(p)$、$3$ 的滑动平均（moving average），然后做 LOESS ($q=n_{n(l)}$,$d=1$) 回归，得到结果序列 $L_v^{(k+1)}, \quad v = 1, \cdots, N$；相当于提取周期子序列的低通量；
Step 4: 去除平滑周期子序列趋势（Detrending of Smoothed Cycle-subseries），$S_v^{(k+1)} = C_v^{(k+1)} - L_v^{(k+1)}$；
Step 5: 去周期（Deseasonalizing），减去周期分量，$Y_v - S_v^{(k+1)}$；
Step 6: 趋势平滑（Trend Smoothing），对于去除周期之后的序列做 LOESS ($q=n_{n(t)}$,$d=1$) 回归，得到趋势分量 $T_v^{(k+1)}$。

外层循环主要用于调节 robustness weight。如果数据序列中有 outlier，则余项会较大。定义
\[h = 6 * median(|R_v|) \]

对于位置为 $v$ 的数据点，其 robustness weight 为
\[\rho_v = B(|R_v|/h) \]
其中 $B$ 函数为 bisquare 函数：
\[B(u) = \left \{ {\matrix {{(1-u^2)^2 } & {for \quad 0 \le u < 1} \cr {0} & {for \quad u \ge 1} \cr } } \right. \]
然后每一次迭代的内循环中，在 Step 2 与 Step 6 中做 LOESS 回归时，邻域权重（neighborhood weight）需要乘以 $\rho_v$，以减少 outlier 对回归的影响。STL 的具体流程如下：

outer loop:
    计算robustness weight;
    inner loop:
        Step 1 去趋势;
        Step 2 周期子序列平滑;
        Step 3 周期子序列的低通量过滤;
        Step 4 去除平滑周期子序列趋势;
        Step 5 去周期;
        Step 6 趋势平滑;

为了使得算法具有足够的 robustness，所以设计了内循环与外循环。特别地，当 $n_(i)$ 足够大时，内循环结束时趋势分量与周期分量已收敛；若时序数据中没有明显的 outlier，可以将 $n_(o)$ 设为 0。

R 提供 STL 函数，底层为作者 Cleveland 的 Fortran 实现。Python 的 statsmodels 实现了一个简单版的时序分解，通过加权滑动平均提取趋势分量，然后对 cycle-subseries 每个时间点数据求平均组成周期分量：

def seasonal_decompose(x, model="additive", filt=None, freq=None, two_sided=True):
    _pandas_wrapper, pfreq = _maybe_get_pandas_wrapper_freq(x)
    x = np.asanyarray(x).squeeze()
    nobs = len(x)
    ...
    if filt is None:
      if freq % 2 == 0:  # split weights at ends
        filt = np.array([.5] + [1] * (freq - 1) + [.5]) / freq
        else:
          filt = np.repeat(1./freq, freq)
 
    nsides = int(two_sided) + 1
    # Linear filtering via convolution. Centered and backward displaced moving weighted average.
    trend = convolution_filter(x, filt, nsides)
    if model.startswith('m'):
        detrended = x / trend
    else:
        detrended = x - trend
 
    period_averages = seasonal_mean(detrended, freq)
 
    if model.startswith('m'):
        period_averages /= np.mean(period_averages)
    else:
        period_averages -= np.mean(period_averages)
 
    seasonal = np.tile(period_averages, nobs // freq + 1)[:nobs]
 
    if model.startswith('m'):
        resid = x / seasonal / trend
    else:
        resid = detrended - seasonal
 
    results = lmap(_pandas_wrapper, [seasonal, trend, resid, x])
    return DecomposeResult(seasonal=results[0], trend=results[1],
                           resid=results[2], observed=results[3])

R 版 STL 分解带噪音点数据的结果如下图：

data = read.csv("artificialWithAnomaly/art_daily_flatmiddle.csv")
View(data)
data_decomp <- stl(ts(data[[2]], frequency = 1440/5), s.window = "periodic", robust = TRUE)
plot(data_decomp)

statsmodels 模块的时序分解的结果如下图：

import statsmodels.api as sm
import matplotlib.pyplot as plt
import pandas as pd
from date_utils import get_gran, format_timestamp
 
 
dta = pd.read_csv('artificialWithAnomaly/art_daily_flatmiddle.csv',
                                   usecols=['timestamp', 'value'])
dta = format_timestamp(dta)
dta = dta.set_index('timestamp')
dta['value'] = dta['value'].apply(pd.to_numeric, errors='ignore')
dta.value.interpolate(inplace=True)
res = sm.tsa.seasonal_decompose(dta.value, freq=288)
res.plot()
plt.show()

2. 参考资料

[1] Cleveland, Robert B., William S. Cleveland, and Irma Terpenning. "STL: A seasonal-trend decomposition procedure based on loess." Journal of Official Statistics 6.1 (1990): 3.

来源: http://www.cnblogs.com/en-heng/p/7390310.html

与本文相关文章

暂无,快来抢沙发吧！