Wang Jinjuan, Li Na, Meng Zhen, Li Qizhai
School of Mathematics and Statistics, Beijing Institute of Technology, Beijing, China.
School of Applied Science, Beijing Information Science and Technology University, Beijing, China.
Stat Med. 2023 Nov 10;42(25):4644-4663. doi: 10.1002/sim.9881. Epub 2023 Aug 30.
Identifying the existence and locations of change points has been a broadly encountered task in many statistical application areas. The existing change point detection methods may produce unsatisfactory results for high-dimensional data since certain distributional assumptions are made on data, which are hard to verify in practice. Moreover, some parameters (such as the number of change points) need to be estimated beforehand for some methods, making their powers sensitive to these values. Here, we propose a kernel-based -statistic to identify change points (KUCP) for high dimensional data, which is free of distributional assumptions and sup-parameter estimations. Specifically, we employ a kernel function to describe similarities among the subjects and construct a -statistic to test the existence of change point for a given location. The asymptotic properties of the -statistic are deduced. We also develop a procedure to locate the change points sequentially via a dichotomy algorithm. Extensive simulations demonstrate that KUCP has higher sensitivity in identifying existence of change points and higher accuracy in locating these change points than its counterparts. We further illustrate its practical utility by analyzing a gene expression data of human brain to detect the time point when gene expression profiles begin to change, which has been reported to be closely related with aging brain.
在许多统计应用领域,识别变化点的存在及其位置是一项广泛遇到的任务。现有的变化点检测方法对于高维数据可能会产生不尽人意的结果,因为这些方法对数据做了某些分布假设,而这些假设在实际中很难验证。此外,对于一些方法,某些参数(如变化点的数量)需要预先估计,这使得它们的功效对这些值很敏感。在此,我们提出一种基于核的用于识别高维数据变化点的统计量(KUCP),它无需分布假设和超参数估计。具体而言,我们使用核函数来描述样本之间的相似性,并构造一个统计量来检验给定位置变化点的存在性。推导了该统计量的渐近性质。我们还开发了一种通过二分算法顺序定位变化点的程序。大量模拟表明,与其他方法相比,KUCP在识别变化点的存在性方面具有更高的灵敏度,在定位这些变化点方面具有更高的准确性。我们通过分析人类大脑的基因表达数据来检测基因表达谱开始变化的时间点,进一步说明了它的实际效用,据报道该时间点与大脑衰老密切相关。