Srivastava Sanvesh, Wang Wenyi, Manyam Ganiraju, Ordonez Carlos, Baladandayuthapani Veerabhadran
Department of Biostatistics, Division of Quantitative Sciences, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd, Unit 1411, Houston, Texas, USA.
EURASIP J Bioinform Syst Biol. 2013 Jun 28;2013(1):9. doi: 10.1186/1687-4153-2013-9.
Recent advances in genome technologies and the subsequent collection of genomic information at various molecular resolutions hold promise to accelerate the discovery of new therapeutic targets. A critical step in achieving these goals is to develop efficient clinical prediction models that integrate these diverse sources of high-throughput data. This step is challenging due to the presence of high-dimensionality and complex interactions in the data. For predicting relevant clinical outcomes, we propose a flexible statistical machine learning approach that acknowledges and models the interaction between platform-specific measurements through nonlinear kernel machines and borrows information within and between platforms through a hierarchical Bayesian framework. Our model has parameters with direct interpretations in terms of the effects of platforms and data interactions within and across platforms. The parameter estimation algorithm in our model uses a computationally efficient variational Bayes approach that scales well to large high-throughput datasets.
We apply our methods of integrating gene/mRNA expression and microRNA profiles for predicting patient survival times to The Cancer Genome Atlas (TCGA) based glioblastoma multiforme (GBM) dataset. In terms of prediction accuracy, we show that our non-linear and interaction-based integrative methods perform better than linear alternatives and non-integrative methods that do not account for interactions between the platforms. We also find several prognostic mRNAs and microRNAs that are related to tumor invasion and are known to drive tumor metastasis and severe inflammatory response in GBM. In addition, our analysis reveals several interesting mRNA and microRNA interactions that have known implications in the etiology of GBM.
Our approach gains its flexibility and power by modeling the non-linear interaction structures between and within the platforms. Our framework is a useful tool for biomedical researchers, since clinical prediction using multi-platform genomic information is an important step towards personalized treatment of many cancers. We have a freely available software at: http://odin.mdacc.tmc.edu/~vbaladan.
基因组技术的最新进展以及随后在各种分子分辨率下收集的基因组信息有望加速新治疗靶点的发现。实现这些目标的关键一步是开发高效的临床预测模型,该模型整合这些不同来源的高通量数据。由于数据中存在高维度和复杂的相互作用,这一步具有挑战性。为了预测相关的临床结果,我们提出了一种灵活的统计机器学习方法,该方法通过非线性核机器识别并建模特定平台测量之间的相互作用,并通过分层贝叶斯框架在平台内部和之间借用信息。我们的模型具有在平台效应以及平台内部和跨平台的数据相互作用方面具有直接解释的参数。我们模型中的参数估计算法使用计算效率高的变分贝叶斯方法,该方法能很好地扩展到大型高通量数据集。
我们将整合基因/mRNA表达和microRNA谱以预测患者生存时间的方法应用于基于癌症基因组图谱(TCGA)的多形性胶质母细胞瘤(GBM)数据集。在预测准确性方面,我们表明基于非线性和相互作用的整合方法比不考虑平台之间相互作用的线性替代方法和非整合方法表现更好。我们还发现了几种与肿瘤侵袭相关的预后mRNA和microRNA,已知它们在GBM中驱动肿瘤转移和严重炎症反应。此外,我们的分析揭示了一些在GBM病因学中具有已知意义的有趣的mRNA和microRNA相互作用。
我们的方法通过对平台之间和内部的非线性相互作用结构进行建模而获得灵活性和强大功能。我们的框架是生物医学研究人员的有用工具,因为使用多平台基因组信息进行临床预测是迈向许多癌症个性化治疗的重要一步。我们有一个可免费获取的软件,网址为:http://odin.mdacc.tmc.edu/~vbaladan 。