Phillips Nick E, Manning Cerys, Papalopulu Nancy, Rattray Magnus
Faculty of Biology, Medicine and Health, University of Manchester, Manchester, United Kingdom.
PLoS Comput Biol. 2017 May 11;13(5):e1005479. doi: 10.1371/journal.pcbi.1005479. eCollection 2017 May.
Multiple biological processes are driven by oscillatory gene expression at different time scales. Pulsatile dynamics are thought to be widespread, and single-cell live imaging of gene expression has lead to a surge of dynamic, possibly oscillatory, data for different gene networks. However, the regulation of gene expression at the level of an individual cell involves reactions between finite numbers of molecules, and this can result in inherent randomness in expression dynamics, which blurs the boundaries between aperiodic fluctuations and noisy oscillators. This underlies a new challenge to the experimentalist because neither intuition nor pre-existing methods work well for identifying oscillatory activity in noisy biological time series. Thus, there is an acute need for an objective statistical method for classifying whether an experimentally derived noisy time series is periodic. Here, we present a new data analysis method that combines mechanistic stochastic modelling with the powerful methods of non-parametric regression with Gaussian processes. Our method can distinguish oscillatory gene expression from random fluctuations of non-oscillatory expression in single-cell time series, despite peak-to-peak variability in period and amplitude of single-cell oscillations. We show that our method outperforms the Lomb-Scargle periodogram in successfully classifying cells as oscillatory or non-oscillatory in data simulated from a simple genetic oscillator model and in experimental data. Analysis of bioluminescent live-cell imaging shows a significantly greater number of oscillatory cells when luciferase is driven by a Hes1 promoter (10/19), which has previously been reported to oscillate, than the constitutive MoMuLV 5' LTR (MMLV) promoter (0/25). The method can be applied to data from any gene network to both quantify the proportion of oscillating cells within a population and to measure the period and quality of oscillations. It is publicly available as a MATLAB package.
多种生物学过程由不同时间尺度上的振荡基因表达驱动。脉冲动力学被认为广泛存在,基因表达的单细胞实时成像已产生了大量关于不同基因网络的动态(可能是振荡的)数据。然而,单个细胞水平上的基因表达调控涉及有限数量分子之间的反应,这可能导致表达动力学中固有的随机性,从而模糊了非周期性波动与有噪声振荡器之间的界限。这给实验人员带来了新的挑战,因为无论是直觉还是现有方法都不能很好地用于识别有噪声的生物时间序列中的振荡活动。因此,迫切需要一种客观的统计方法来分类实验得出的有噪声时间序列是否具有周期性。在这里,我们提出了一种新的数据分析方法,该方法将机械随机建模与强大的高斯过程非参数回归方法相结合。我们的方法可以在单细胞时间序列中区分振荡基因表达与非振荡表达的随机波动,尽管单细胞振荡的周期和幅度存在峰峰值变化。我们表明,在将从简单遗传振荡器模型模拟的数据和实验数据成功分类为振荡或非振荡细胞方面,我们的方法优于 Lomb-Scargle 周期图。对生物发光活细胞成像的分析表明,当荧光素酶由先前报道会振荡的 Hes1 启动子驱动时(10/19),振荡细胞的数量明显多于组成型 MoMuLV 5' LTR(MMLV)启动子(0/25)。该方法可应用于来自任何基因网络的数据,以量化群体中振荡细胞的比例,并测量振荡的周期和质量。它作为一个 MATLAB 包公开可用。