Rafor Brian, Gauran Iris Ivy, Ombao Hernando, Lansangan Joseph Ryan, Barrios Erniel
School of Statistics, University of the Philippines, Diliman, Quezon City, 1101, Philippines.
Statistics Program, CEMSE Division, King Abdullah University of Science and Technology, Thuwal, 23955, Kingdom of Saudi Arabia.
Sci Rep. 2025 May 14;15(1):16752. doi: 10.1038/s41598-025-00554-w.
Given more accessible non-invasive measuring devices, experimental response can now be observed as high-dimensional and high-frequency time series. Amidst the complex dependence structure in the data analysis, sample size determination and power analysis remain to be the key thematic focus of statistical inference. The issue is confounded with the complexity of time lag structure and phase shift usually observed in a non-uniform but normal process typically present in medical imaging data. To address these issues in case-control studies, responses can be analyzed to obtain evidence of group differences through time series clustering based on dynamic time warping. The warping of multiple time series provides a flexible distance measure robust to time point concurrence. Time series clustering partitions experimental units into groups, enabling the computation of distances to measure effect size through sum of squares of pairwise distances in warped time series.Time series clustering provides an alternative to analysis of variance when experimental responses are high-frequency time series data. Kernel regression is formulated to link sample size, effect size, power of the test, and level of significance accounting for the structure of the data generating process of the time series responses. This provides a strategy for clinicians to optimize the power of the test that can be achieved with a minimal sample size for this experimental setup. Time series clustering method is able to differentiate case and control groups in the simulated data and in the ADHD-200 fMRI dataset. The distance measured between two or more groups of time series can be used to determine sample size for a target power.
有了更易于使用的非侵入性测量设备,现在可以将实验响应观察为高维和高频时间序列。在数据分析中复杂的依赖结构中,样本量确定和功效分析仍然是统计推断的关键主题重点。这个问题因时间滞后结构和相移的复杂性而变得更加复杂,这些通常在医学成像数据中典型的非均匀但正常的过程中观察到。为了解决病例对照研究中的这些问题,可以通过基于动态时间规整的时间序列聚类来分析响应,以获得组间差异的证据。多个时间序列的规整提供了一种灵活的距离度量,对时间点的同时出现具有鲁棒性。时间序列聚类将实验单元划分为不同的组,通过规整时间序列中两两距离的平方和来计算距离,从而能够度量效应大小。当实验响应是高频时间序列数据时,时间序列聚类为方差分析提供了一种替代方法。核回归的制定是为了将样本量、效应大小、检验功效和显著性水平联系起来,同时考虑时间序列响应的数据生成过程的结构。这为临床医生提供了一种策略,以优化在此实验设置下以最小样本量可实现的检验功效。时间序列聚类方法能够在模拟数据和注意力缺陷多动障碍200功能磁共振成像数据集(ADHD-200 fMRI dataset)中区分病例组和对照组。两组或多组时间序列之间测量的距离可用于确定目标功效所需的样本量。