Department of Computer Science, University of California Irvine, Irvine, CA 92697, USA.
Bioinformatics. 2010 Mar 15;26(6):770-6. doi: 10.1093/bioinformatics/btq022. Epub 2010 Feb 9.
Time-course gene expression datasets provide important insights into dynamic aspects of biological processes, such as circadian rhythms, cell cycle and organ development. In a typical microarray time-course experiment, measurements are obtained at each time point from multiple replicate samples. Accurately recovering the gene expression patterns from experimental observations is made challenging by both measurement noise and variation among replicates' rates of development. Prior work on this topic has focused on inference of expression patterns assuming that the replicate times are synchronized. We develop a statistical approach that simultaneously infers both (i) the underlying (hidden) expression profile for each gene, as well as (ii) the biological time for each individual replicate. Our approach is based on Gaussian process regression (GPR) combined with a probabilistic model that accounts for uncertainty about the biological development time of each replicate.
We apply GPR with uncertain measurement times to a microarray dataset of mRNA expression for the hair-growth cycle in mouse back skin, predicting both profile shapes and biological times for each replicate. The predicted time shifts show high consistency with independently obtained morphological estimates of relative development. We also show that the method systematically reduces prediction error on out-of-sample data, significantly reducing the mean squared error in a cross-validation study.
Matlab code for GPR with uncertain time shifts is available at http://sli.ics.uci.edu/Code/GPRTimeshift/
时程基因表达数据集为生物过程的动态方面提供了重要的见解,例如昼夜节律、细胞周期和器官发育。在典型的微阵列时程实验中,从多个重复样本在每个时间点获得测量值。由于测量噪声和重复样本发育速度的变化,准确地从实验观察中恢复基因表达模式具有挑战性。关于这个主题的先前工作集中在假设重复时间同步的情况下推断表达模式。我们开发了一种统计方法,同时推断每个基因的潜在(隐藏)表达谱,以及每个个体重复的生物学时间。我们的方法基于高斯过程回归(GPR)与概率模型相结合,该模型考虑了每个重复的生物学发育时间的不确定性。
我们将具有不确定测量时间的 GPR 应用于小鼠背部皮肤毛发生长周期的 mRNA 表达的微阵列数据集,预测每个重复的形状和生物学时间。预测的时间移位与相对发育的独立获得的形态估计高度一致。我们还表明,该方法系统地减少了样本外数据的预测误差,在交叉验证研究中显著降低了均方误差。
具有不确定时间移位的 GPR 的 Matlab 代码可在 http://sli.ics.uci.edu/Code/GPRTimeshift/ 获得。