通过样本分割一致选择变化点的数量

CONSISTENT SELECTION OF THE NUMBER OF CHANGE-POINTS VIA SAMPLE-SPLITTING.

作者信息

Zou Changliang, Wang Guanghui, Li Runze

机构信息

Institute of Statistics and LPMC, Nankai University, Tianjin 300071, China

Department of Statistics, and The Methodology Center, The Pennsylvania State University, University Park, PA 16802-2111, USA

出版信息

Ann Stat. 2020 Feb;48(1):413-439. doi: 10.1214/19-aos1814. Epub 2020 Feb 17.

DOI:10.1214/19-aos1814

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7397423/

Abstract

In multiple change-point analysis, one of the major challenges is to estimate the number of change-points. Most existing approaches attempt to minimize a Schwarz information criterion which balances a term quantifying model fit with a penalization term accounting for model complexity that increases with the number of change-points and limits overfitting. However, different penalization terms are required to adapt to different contexts of multiple change-point problems and the optimal penalization magnitude usually varies from the model and error distribution. We propose a data-driven selection criterion that is applicable to most kinds of popular change-point detection methods, including binary segmentation and optimal partitioning algorithms. The key idea is to select the number of change-points that minimizes the squared prediction error, which measures the fit of a specified model for a new sample. We develop a cross-validation estimation scheme based on an order-preserved sample-splitting strategy, and establish its asymptotic selection consistency under some mild conditions. Effectiveness of the proposed selection criterion is demonstrated on a variety of numerical experiments and real-data examples.

摘要

在多重变点分析中，主要挑战之一是估计变点的数量。大多数现有方法试图最小化施瓦茨信息准则，该准则平衡了一个量化模型拟合的项与一个惩罚项，惩罚项考虑了随着变点数增加而增加的模型复杂性，并限制过拟合。然而，需要不同的惩罚项来适应多重变点问题的不同背景，并且最优惩罚幅度通常因模型和误差分布而异。我们提出了一种数据驱动的选择准则，它适用于大多数流行的变点检测方法，包括二元分割和最优划分算法。关键思想是选择使平方预测误差最小的变点数，平方预测误差衡量了指定模型对新样本的拟合程度。我们基于顺序保持样本分割策略开发了一种交叉验证估计方案，并在一些温和条件下建立了其渐近选择一致性。在各种数值实验和实际数据示例中证明了所提出选择准则的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1998/7397423/258422876df4/nihms-1022718-f0001.jpg

相似文献

1

CONSISTENT SELECTION OF THE NUMBER OF CHANGE-POINTS VIA SAMPLE-SPLITTING.通过样本分割一致选择变化点的数量

Ann Stat. 2020 Feb;48(1):413-439. doi: 10.1214/19-aos1814. Epub 2020 Feb 17.

2

Consistent Model Selection in Segmented Line Regression.分段线性回归中的一致性模型选择

J Stat Plan Inference. 2016 Mar 1;170:106-116. doi: 10.1016/j.jspi.2015.09.008.

3

Consistent Estimation of Dimensionality for Data-Driven Methods in fMRI Analysis.基于数据驱动的 fMRI 分析方法的维度一致估计。

IEEE Trans Med Imaging. 2019 Feb;38(2):493-503. doi: 10.1109/TMI.2018.2866640. Epub 2018 Aug 22.

4

Optimal Subsampling for Large Sample Logistic Regression.大样本逻辑回归的最优子采样

J Am Stat Assoc. 2018;113(522):829-844. doi: 10.1080/01621459.2017.1292914. Epub 2018 Jun 6.

5

Efficient multiple change point detection for high-dimensional generalized linear models.高维广义线性模型的高效多变化点检测

Can J Stat. 2023 Jun;51(2):596-629. doi: 10.1002/cjs.11721. Epub 2022 Sep 16.

6

Parsimonious model selection using information theory: a modified selection rule.基于信息论的简约模型选择：一种改进的选择规则。

Ecology. 2021 Oct;102(10):e03475. doi: 10.1002/ecy.3475. Epub 2021 Sep 1.

7

Overoptimism in cross-validation when using partial least squares-discriminant analysis for omics data: a systematic study.使用偏最小二乘判别分析进行组学数据分析时，交叉验证中的过度乐观：一项系统研究。

Anal Bioanal Chem. 2018 Sep;410(23):5981-5992. doi: 10.1007/s00216-018-1217-1. Epub 2018 Jun 29.

8

Detecting multiple generalized change-points by isolating single ones.通过分离单个广义变化点来检测多个广义变化点。

Metrika. 2022;85(2):141-174. doi: 10.1007/s00184-021-00821-6. Epub 2021 May 24.

9

Detection of uterine MMG contractions using a multiple change point estimator and the K-means cluster algorithm.使用多重变化点估计器和K均值聚类算法检测子宫MMG收缩。

IEEE Trans Biomed Eng. 2008 Feb;55(2 Pt 1):453-67. doi: 10.1109/TBME.2007.912663.

10

Detecting possibly frequent change-points: Wild Binary Segmentation 2 and steepest-drop model selection-rejoinder.检测可能频繁出现的变化点：野生二元分割2和最陡下降模型选择——回应

J Korean Stat Soc. 2020;49(4):1099-1105. doi: 10.1007/s42952-020-00085-2. Epub 2020 Sep 16.

引用本文的文献

1

Generalized data thinning using sufficient statistics.使用充分统计量的广义数据精简

J Am Stat Assoc. 2025;120(549):511-523. doi: 10.1080/01621459.2024.2353948. Epub 2024 Jun 13.

2

Estimation of common breaks in linear panel data models via screening and ranking algorithm.通过筛选和排序算法估计线性面板数据模型中的共同断点

Sci Rep. 2025 Apr 2;15(1):11338. doi: 10.1038/s41598-025-96322-x.

3

Quantifying uncertainty in spikes estimated from calcium imaging data.从钙成像数据估计的尖峰中量化不确定性。

Biostatistics. 2023 Apr 14;24(2):481-501. doi: 10.1093/biostatistics/kxab034.

本文引用的文献

1

A computationally efficient nonparametric approach for changepoint detection.一种用于变化点检测的计算效率高的非参数方法。

Stat Comput. 2017;27(5):1293-1305. doi: 10.1007/s11222-016-9687-5. Epub 2016 Jul 28.

2

Multiple Change-Point Detection via a Screening and Ranking Algorithm.基于筛选与排序算法的多变化点检测

Stat Sin. 2013 Jul 1;23(4):1553-1572. doi: 10.5705/ss.2012.018s.

3

THE SCREENING AND RANKING ALGORITHM TO DETECT DNA COPY NUMBER VARIATIONS.用于检测DNA拷贝数变异的筛选和排序算法

Ann Appl Stat. 2012 Sep;6(3):1306-1326. doi: 10.1214/12-AOAS539SUPP.

4

Optimal Sparse Segment Identification with Application in Copy Number Variation Analysis.用于拷贝数变异分析的最优稀疏片段识别

J Am Stat Assoc. 2010 Apr 1;105(491):1156-1166. doi: 10.1198/jasa.2010.tm10083. Epub 2012 Jan 1.

5

A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data.一种修正的贝叶斯信息准则及其在比较基因组杂交数据分析中的应用。

Biometrics. 2007 Mar;63(1):22-32. doi: 10.1111/j.1541-0420.2006.00662.x.

6

Algorithms for the optimal identification of segment neighborhoods.用于段邻域最优识别的算法。

Bull Math Biol. 1989;51(1):39-54. doi: 10.1007/BF02458835.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验