Suppr超能文献

通过样本分割一致选择变化点的数量

CONSISTENT SELECTION OF THE NUMBER OF CHANGE-POINTS VIA SAMPLE-SPLITTING.

作者信息

Zou Changliang, Wang Guanghui, Li Runze

机构信息

Institute of Statistics and LPMC, Nankai University, Tianjin 300071, China

Department of Statistics, and The Methodology Center, The Pennsylvania State University, University Park, PA 16802-2111, USA

出版信息

Ann Stat. 2020 Feb;48(1):413-439. doi: 10.1214/19-aos1814. Epub 2020 Feb 17.

Abstract

In multiple change-point analysis, one of the major challenges is to estimate the number of change-points. Most existing approaches attempt to minimize a Schwarz information criterion which balances a term quantifying model fit with a penalization term accounting for model complexity that increases with the number of change-points and limits overfitting. However, different penalization terms are required to adapt to different contexts of multiple change-point problems and the optimal penalization magnitude usually varies from the model and error distribution. We propose a data-driven selection criterion that is applicable to most kinds of popular change-point detection methods, including binary segmentation and optimal partitioning algorithms. The key idea is to select the number of change-points that minimizes the squared prediction error, which measures the fit of a specified model for a new sample. We develop a cross-validation estimation scheme based on an order-preserved sample-splitting strategy, and establish its asymptotic selection consistency under some mild conditions. Effectiveness of the proposed selection criterion is demonstrated on a variety of numerical experiments and real-data examples.

摘要

在多重变点分析中,主要挑战之一是估计变点的数量。大多数现有方法试图最小化施瓦茨信息准则,该准则平衡了一个量化模型拟合的项与一个惩罚项,惩罚项考虑了随着变点数增加而增加的模型复杂性,并限制过拟合。然而,需要不同的惩罚项来适应多重变点问题的不同背景,并且最优惩罚幅度通常因模型和误差分布而异。我们提出了一种数据驱动的选择准则,它适用于大多数流行的变点检测方法,包括二元分割和最优划分算法。关键思想是选择使平方预测误差最小的变点数,平方预测误差衡量了指定模型对新样本的拟合程度。我们基于顺序保持样本分割策略开发了一种交叉验证估计方案,并在一些温和条件下建立了其渐近选择一致性。在各种数值实验和实际数据示例中证明了所提出选择准则的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1998/7397423/258422876df4/nihms-1022718-f0001.jpg

相似文献

2
Consistent Model Selection in Segmented Line Regression.分段线性回归中的一致性模型选择
J Stat Plan Inference. 2016 Mar 1;170:106-116. doi: 10.1016/j.jspi.2015.09.008.
3
Consistent Estimation of Dimensionality for Data-Driven Methods in fMRI Analysis.基于数据驱动的 fMRI 分析方法的维度一致估计。
IEEE Trans Med Imaging. 2019 Feb;38(2):493-503. doi: 10.1109/TMI.2018.2866640. Epub 2018 Aug 22.
4
Optimal Subsampling for Large Sample Logistic Regression.大样本逻辑回归的最优子采样
J Am Stat Assoc. 2018;113(522):829-844. doi: 10.1080/01621459.2017.1292914. Epub 2018 Jun 6.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验