Straube Jasmin, Gorse Alain-Dominique, Huang Bevan Emma, Lê Cao Kim-Anh
QFAB Bioinformatics, Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia; The University of Queensland Diamantina Institute, Translational Research Institute, Brisbane, QLD, Australia.
QFAB Bioinformatics, Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia.
PLoS One. 2015 Aug 27;10(8):e0134540. doi: 10.1371/journal.pone.0134540. eCollection 2015.
Time course 'omics' experiments are becoming increasingly important to study system-wide dynamic regulation. Despite their high information content, analysis remains challenging. 'Omics' technologies capture quantitative measurements on tens of thousands of molecules. Therefore, in a time course 'omics' experiment molecules are measured for multiple subjects over multiple time points. This results in a large, high-dimensional dataset, which requires computationally efficient approaches for statistical analysis. Moreover, methods need to be able to handle missing values and various levels of noise. We present a novel, robust and powerful framework to analyze time course 'omics' data that consists of three stages: quality assessment and filtering, profile modelling, and analysis. The first step consists of removing molecules for which expression or abundance is highly variable over time. The second step models each molecular expression profile in a linear mixed model framework which takes into account subject-specific variability. The best model is selected through a serial model selection approach and results in dimension reduction of the time course data. The final step includes two types of analysis of the modelled trajectories, namely, clustering analysis to identify groups of correlated profiles over time, and differential expression analysis to identify profiles which differ over time and/or between treatment groups. Through simulation studies we demonstrate the high sensitivity and specificity of our approach for differential expression analysis. We then illustrate how our framework can bring novel insights on two time course 'omics' studies in breast cancer and kidney rejection. The methods are publicly available, implemented in the R CRAN package lmms.
时间进程“组学”实验对于研究全系统动态调控变得越来越重要。尽管它们具有很高的信息含量,但分析仍然具有挑战性。“组学”技术可对数以万计的分子进行定量测量。因此,在时间进程“组学”实验中,会在多个时间点对多个受试者的分子进行测量。这会产生一个大型的高维数据集,需要计算效率高的方法进行统计分析。此外,方法还需要能够处理缺失值和各种噪声水平。我们提出了一个新颖、稳健且强大的框架来分析时间进程“组学”数据,该框架包括三个阶段:质量评估与过滤、轮廓建模和分析。第一步包括去除那些表达或丰度随时间变化很大的分子。第二步在一个线性混合模型框架中对每个分子表达轮廓进行建模,该框架考虑了受试者特异性变异性。通过串行模型选择方法选择最佳模型,从而实现时间进程数据的降维。最后一步包括对建模轨迹的两种类型的分析,即聚类分析以识别随时间相关轮廓的组,以及差异表达分析以识别随时间和/或治疗组之间不同的轮廓。通过模拟研究,我们证明了我们的方法在差异表达分析中的高灵敏度和特异性。然后,我们说明了我们的框架如何能够在两项乳腺癌和肾移植排斥反应的时间进程“组学”研究中带来新的见解。这些方法是公开可用的,在R CRAN包lmms中实现。