用于时间序列的非线性门控专家：发现模式并避免过拟合。

Nonlinear gated experts for time series: discovering regimes and avoiding overfitting.

作者信息

Weigend A S, Mangeas M, Srivastava A N

机构信息

Department of Computer Science, University of Colorado, Boulder, 80309-0430, USA.

出版信息

Int J Neural Syst. 1995 Dec;6(4):373-99. doi: 10.1142/s0129065795000251.

DOI:10.1142/s0129065795000251

PMID:8963468

Abstract

In the analysis and prediction of real-world systems, two of the key problems are nonstationarity (often in the form of switching between regimes) and overfitting (particularly serious for noisy processes). This article addresses these problems using gated experts, consisting of a (nonlinear) gating network, and several (also nonlinear) competing experts. Each expert learns to predict the conditional mean, and each expert adapts its width to match the noise level in its regime. The gating network learns to predict the probability of each expert, given the input. This article focuses on the case where the gating network bases its decision on information from the inputs. This can be contrasted to hidden Markov models where the decision is based on the previous state(s) (i.e. on the output of the gating network at the previous time step), as well as to averaging over several predictors. In contrast, gated experts soft-partition the input space, only learning to model their region. This article discusses the underlying statistical assumptions, derives the weight update rules, and compares the performance of gated experts to standard methods on three time series: (1) a computer-generated series, obtained by randomly switching between two nonlinear processes; (2) a time series from the Santa Fe Time Series Competition (the light intensity of a laser in chaotic state); and (3) the daily electricity demand of France, a real-world multivariate problem with structure on several time scales. The main results are: (1) the gating network correctly discovers the different regimes of the process; (2) the widths associated with each expert are important for the segmentation task (and they can be used to characterize the sub-processes); and (3) there is less overfitting compared to single networks (homogeneous multilayer perceptrons), since the experts learn to match their variances to the (local) noise levels. This can be viewed as matching the local complexity of the model to the local complexity of the data.

摘要

在对现实世界系统进行分析和预测时，两个关键问题是非平稳性（通常表现为不同状态之间的切换）和过拟合（对于有噪声的过程尤为严重）。本文使用门控专家来解决这些问题，门控专家由一个（非线性）门控网络和几个（同样是非线性）竞争专家组成。每个专家学习预测条件均值，并且每个专家调整其宽度以匹配其所处状态下的噪声水平。门控网络学习根据输入预测每个专家的概率。本文重点关注门控网络基于输入信息进行决策的情况。这与隐马尔可夫模型形成对比，在隐马尔可夫模型中决策基于先前状态（即前一时间步上门控网络的输出），也与对多个预测器进行平均的情况形成对比。相比之下，门控专家对输入空间进行软划分，只学习对其区域进行建模。本文讨论了潜在的统计假设，推导了权重更新规则，并在三个时间序列上比较了门控专家与标准方法的性能：（1）一个通过在两个非线性过程之间随机切换获得的计算机生成序列；（2）圣达菲时间序列竞赛中的一个时间序列（混沌状态下激光的光强）；（3）法国的每日电力需求，这是一个具有多个时间尺度结构的现实世界多变量问题。主要结果如下：（1）门控网络正确地发现了过程的不同状态；（2）与每个专家相关联的宽度对于分割任务很重要（并且它们可用于表征子过程）；（3）与单个网络（均匀多层感知器）相比，过拟合较少，因为专家们学习使它们的方差与（局部）噪声水平相匹配。这可以看作是使模型的局部复杂度与数据的局部复杂度相匹配。