Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, Chicago, Illinois, USA.
Biometrics. 2023 Dec;79(4):3332-3344. doi: 10.1111/biom.13844. Epub 2023 Mar 14.
We consider inference problems for high-dimensional (HD) functional data with a dense number of T repeated measurements taken for a large number of p variables from a small number of n experimental units. The spatial and temporal dependence, high dimensionality, and dense number of repeated measurements pose theoretical and computational challenges. This paper has two aims; our first aim is to solve the theoretical and computational challenges in testing equivalence among covariance matrices from HD functional data. The second aim is to provide computationally efficient and tuning-free tools with guaranteed stochastic error control. The weak convergence of the stochastic process formed by the test statistics is established under the "large p, large T, and small n" setting. If the null is rejected, we further show that the locations of the change points can be estimated consistently. The estimator's rate of convergence is shown to depend on the data dimension, sample size, number of repeated measurements, and signal-to-noise ratio. We also show that our proposed computation algorithms can significantly reduce the computation time and are applicable to real-world data with a large number of HD-repeated measurements (e.g., functional magnetic resonance imaging (fMRI) data). Simulation results demonstrate both the finite sample performance and computational effectiveness of our proposed procedures. We observe that the empirical size of the test is well controlled at the nominal level, and the locations of multiple change points can be accurately identified. An application to fMRI data demonstrates that our proposed methods can identify event boundaries in the preface of the television series Sherlock. Code to implement the procedures is available in an R package named TechPhD.
我们考虑了具有高密度 T 次重复测量的高维(HD)功能数据的推断问题,这些数据是从少量 n 个实验单元中大量 p 个变量中获得的。空间和时间依赖性、高维性和密集的重复测量次数带来了理论和计算上的挑战。本文有两个目的;我们的第一个目的是解决从 HD 功能数据中检验协方差矩阵等效性的理论和计算挑战。第二个目的是提供计算效率高、无需调整且具有保证随机误差控制的工具。在“大 p、大 T 和小 n”设置下,建立了由测试统计量形成的随机过程的弱收敛性。如果拒绝零假设,我们进一步表明可以一致地估计变化点的位置。该估计器的收敛速度取决于数据维度、样本大小、重复测量次数和信噪比。我们还表明,我们提出的计算算法可以显著减少计算时间,并且适用于具有大量 HD 重复测量(例如,功能磁共振成像(fMRI)数据)的实际数据。模拟结果证明了我们提出的程序的有限样本性能和计算有效性。我们观察到,测试的经验大小在名义水平上得到了很好的控制,并且可以准确识别多个变化点的位置。对 fMRI 数据的应用表明,我们提出的方法可以识别电视剧《神探夏洛克》前言中的事件边界。实现这些程序的代码在一个名为 TechPhD 的 R 包中可用。