Sahay Shabnam, Adhikari Shishir, Hormoz Sahand, Chakrabarti Shaon
Department of Computer Science, Indian Institute of Technology Bombay.
Simons Centre for the Study of Living Machines, National Centre for Biological Sciences, Bangalore.
bioRxiv. 2023 Apr 18:2023.03.21.533651. doi: 10.1101/2023.03.21.533651.
Detecting oscillations in time series remains a challenging problem even after decades of research. In chronobiology, rhythms in time series (for instance gene expression, eclosion, egg-laying and feeding) datasets tend to be low amplitude, display large variations amongst replicates, and often exhibit varying peak-to-peak distances (non-stationarity). Most currently available rhythm detection methods are not specifically designed to handle such datasets. Here we introduce a new method, ODeGP ( scillation tection using aussian rocesses), which combines Gaussian Process (GP) regression with Bayesian inference to provide a flexible approach to the problem. Besides naturally incorporating measurement errors and non-uniformly sampled data, ODeGP uses a recently developed kernel to improve detection of non-stationary waveforms. An additional advantage is that by using Bayes factors instead of p-values, ODeGP models both the null (non-rhythmic) and the alternative (rhythmic) hypotheses. Using a variety of synthetic datasets we first demonstrate that ODeGP almost always outperforms eight commonly used methods in detecting stationary as well as non-stationary oscillations. Next, on analyzing existing qPCR datasets that exhibit low amplitude and noisy oscillations, we demonstrate that our method is more sensitive compared to the existing methods at detecting weak oscillations. Finally, we generate new qPCR time-series datasets on pluripotent mouse embryonic stem cells, which are expected to exhibit no oscillations of the core circadian clock genes. Surprisingly, we discover using ODeGP that increasing cell density can result in the rapid generation of oscillations in the gene, thus highlighting our method’s ability to discover unexpected patterns. In its current implementation, ODeGP (available as an R package) is meant only for analyzing single or a few time-trajectories, not genome-wide datasets.
即便经过数十年的研究,检测时间序列中的振荡仍是一个具有挑战性的问题。在生物钟学中,时间序列数据集(例如基因表达、羽化、产卵和进食)中的节律往往振幅较低,在重复样本间表现出较大差异,并且常常呈现出峰峰距离各异的情况(非平稳性)。目前大多数可用的节律检测方法并非专门为处理此类数据集而设计。在此,我们引入一种新方法,即ODeGP(使用高斯过程的振荡检测法),它将高斯过程(GP)回归与贝叶斯推理相结合,为该问题提供了一种灵活的解决方法。除了自然地纳入测量误差和非均匀采样数据外,ODeGP使用一种最近开发的核来改进对非平稳波形的检测。另一个优点是,通过使用贝叶斯因子而非p值,ODeGP对零假设(无节律)和备择假设(有节律)都进行了建模。我们首先使用各种合成数据集证明,在检测平稳和非平稳振荡方面,ODeGP几乎总是优于八种常用方法。接下来,在分析现有的呈现低振幅和噪声振荡的qPCR数据集时,我们证明在检测微弱振荡方面,我们的方法比现有方法更灵敏。最后,我们在多能小鼠胚胎干细胞上生成了新的qPCR时间序列数据集,预计这些数据集不会呈现核心生物钟基因的振荡。令人惊讶的是,我们使用ODeGP发现,增加细胞密度会导致基因中振荡的快速产生,从而突出了我们的方法发现意外模式的能力。在当前的实现中,ODeGP(作为一个R包提供)仅用于分析单个或少数时间轨迹,而非全基因组数据集。