Program in Computational Biology and Bioinformatics, Department of Mathematics, Duke University, Durham, NC 27708, USA, Department of Medicine, Department of Pharmacology, Institute for Translational Medicine and Therapeutics, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA and Department of Biology, Duke University, Durham, NC 27708, USA.
Bioinformatics. 2013 Dec 15;29(24):3174-80. doi: 10.1093/bioinformatics/btt541. Epub 2013 Sep 20.
To discover and study periodic processes in biological systems, we sought to identify periodic patterns in their gene expression data. We surveyed a large number of available methods for identifying periodicity in time series data and chose representatives of different mathematical perspectives that performed well on both synthetic data and biological data. Synthetic data were used to evaluate how each algorithm responds to different curve shapes, periods, phase shifts, noise levels and sampling rates. The biological datasets we tested represent a variety of periodic processes from different organisms, including the cell cycle and metabolic cycle in Saccharomyces cerevisiae, circadian rhythms in Mus musculus and the root clock in Arabidopsis thaliana.
From these results, we discovered that each algorithm had different strengths. Based on our findings, we make recommendations for selecting and applying these methods depending on the nature of the data and the periodic patterns of interest. Additionally, these results can also be used to inform the design of large-scale biological rhythm experiments so that the resulting data can be used with these algorithms to detect periodic signals more effectively.
为了发现和研究生物系统中的周期性过程,我们试图在其基因表达数据中识别周期性模式。我们调查了大量可用于识别时间序列数据中周期性的方法,并选择了在合成数据和生物数据上表现良好的不同数学观点的代表。使用合成数据来评估每个算法如何响应不同的曲线形状、周期、相位偏移、噪声水平和采样率。我们测试的生物数据集代表了来自不同生物体的各种周期性过程,包括酿酒酵母的细胞周期和代谢周期、小家鼠的昼夜节律和拟南芥的根时钟。
根据这些结果,我们发现每个算法都有不同的优势。基于我们的发现,我们根据数据的性质和感兴趣的周期性模式,推荐选择和应用这些方法。此外,这些结果还可以用于为大规模生物节律实验的设计提供信息,以便可以使用这些算法更有效地检测周期性信号。