Zou Changliang, Wang Guanghui, Li Runze
Institute of Statistics and LPMC, Nankai University, Tianjin 300071, China
Department of Statistics, and The Methodology Center, The Pennsylvania State University, University Park, PA 16802-2111, USA
Ann Stat. 2020 Feb;48(1):413-439. doi: 10.1214/19-aos1814. Epub 2020 Feb 17.
In multiple change-point analysis, one of the major challenges is to estimate the number of change-points. Most existing approaches attempt to minimize a Schwarz information criterion which balances a term quantifying model fit with a penalization term accounting for model complexity that increases with the number of change-points and limits overfitting. However, different penalization terms are required to adapt to different contexts of multiple change-point problems and the optimal penalization magnitude usually varies from the model and error distribution. We propose a data-driven selection criterion that is applicable to most kinds of popular change-point detection methods, including binary segmentation and optimal partitioning algorithms. The key idea is to select the number of change-points that minimizes the squared prediction error, which measures the fit of a specified model for a new sample. We develop a cross-validation estimation scheme based on an order-preserved sample-splitting strategy, and establish its asymptotic selection consistency under some mild conditions. Effectiveness of the proposed selection criterion is demonstrated on a variety of numerical experiments and real-data examples.
在多重变点分析中,主要挑战之一是估计变点的数量。大多数现有方法试图最小化施瓦茨信息准则,该准则平衡了一个量化模型拟合的项与一个惩罚项,惩罚项考虑了随着变点数增加而增加的模型复杂性,并限制过拟合。然而,需要不同的惩罚项来适应多重变点问题的不同背景,并且最优惩罚幅度通常因模型和误差分布而异。我们提出了一种数据驱动的选择准则,它适用于大多数流行的变点检测方法,包括二元分割和最优划分算法。关键思想是选择使平方预测误差最小的变点数,平方预测误差衡量了指定模型对新样本的拟合程度。我们基于顺序保持样本分割策略开发了一种交叉验证估计方案,并在一些温和条件下建立了其渐近选择一致性。在各种数值实验和实际数据示例中证明了所提出选择准则的有效性。