Mu Wancen, Chen Jiawen, Davis Eric S, Reed Kathleen, Phanstiel Douglas, Love Michael I, Li Didong
Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.
Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.
Biometrics. 2025 Jan 7;81(1). doi: 10.1093/biomtc/ujae156.
Investigating the relationship, particularly the lead-lag effect, between time series is a common question across various disciplines, especially when uncovering biological processes. However, analyzing time series presents several challenges. Firstly, due to technical reasons, the time points at which observations are made are not at uniform intervals. Secondly, some lead-lag effects are transient, necessitating time-lag estimation based on a limited number of time points. Thirdly, external factors also impact these time series, requiring a similarity metric to assess the lead-lag relationship. To counter these issues, we introduce a model grounded in the Gaussian process, affording the flexibility to estimate lead-lag effects for irregular time series. In addition, our method outputs dissimilarity scores, thereby broadening its applications to include tasks such as ranking or clustering multiple pairwise time series when considering their strength of lead-lag effects with external factors. Crucially, we offer a series of theoretical proofs to substantiate the validity of our proposed kernels and the identifiability of kernel parameters. Our model demonstrates advances in various simulations and real-world applications, particularly in the study of dynamic chromatin interactions, compared to other leading methods.
研究时间序列之间的关系,尤其是超前-滞后效应,是各个学科中常见的问题,在揭示生物过程时尤为如此。然而,分析时间序列存在若干挑战。首先,由于技术原因,进行观测的时间点并非等间隔的。其次,一些超前-滞后效应是短暂的,这就需要基于有限数量的时间点来估计时间滞后。第三,外部因素也会影响这些时间序列,需要一种相似性度量来评估超前-滞后关系。为应对这些问题,我们引入了一个基于高斯过程的模型,它能够灵活地估计不规则时间序列的超前-滞后效应。此外,我们的方法输出差异分数,从而拓宽了其应用范围,包括在考虑多个成对时间序列与外部因素的超前-滞后效应强度时进行排序或聚类等任务。至关重要的是,我们提供了一系列理论证明,以证实我们提出的核的有效性以及核参数的可识别性。与其他领先方法相比,我们的模型在各种模拟和实际应用中都有进展,特别是在动态染色质相互作用的研究中。