Artificial Intelligence Research Laboratory, College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, 16802, USA.
The Center for Big Data Analytics and Discovery Informatics, Pennsylvania State University, University Park, PA, 16802, USA.
BMC Med Genomics. 2018 Sep 14;11(Suppl 3):71. doi: 10.1186/s12920-018-0388-0.
Large-scale collaborative precision medicine initiatives (e.g., The Cancer Genome Atlas (TCGA)) are yielding rich multi-omics data. Integrative analyses of the resulting multi-omics data, such as somatic mutation, copy number alteration (CNA), DNA methylation, miRNA, gene expression, and protein expression, offer tantalizing possibilities for realizing the promise and potential of precision medicine in cancer prevention, diagnosis, and treatment by substantially improving our understanding of underlying mechanisms as well as the discovery of novel biomarkers for different types of cancers. However, such analyses present a number of challenges, including heterogeneity, and high-dimensionality of omics data.
We propose a novel framework for multi-omics data integration using multi-view feature selection. We introduce a novel multi-view feature selection algorithm, MRMR-mv, an adaptation of the well-known Min-Redundancy and Maximum-Relevance (MRMR) single-view feature selection algorithm to the multi-view setting.
We report results of experiments using an ovarian cancer multi-omics dataset derived from the TCGA database on the task of predicting ovarian cancer survival. Our results suggest that multi-view models outperform both view-specific models (i.e., models trained and tested using a single type of omics data) and models based on two baseline data fusion methods.
Our results demonstrate the potential of multi-view feature selection in integrative analyses and predictive modeling from multi-omics data.
大规模协作的精准医学计划(例如癌症基因组图谱(TCGA))正在产生丰富的多组学数据。对由此产生的多组学数据进行综合分析,如体细胞突变、拷贝数改变(CNA)、DNA 甲基化、miRNA、基因表达和蛋白质表达,通过大大提高我们对潜在机制的理解以及发现不同类型癌症的新型生物标志物,为实现精准医学在癌症预防、诊断和治疗中的承诺和潜力提供了诱人的可能性。然而,此类分析存在许多挑战,包括组学数据的异质性和高维性。
我们提出了一种使用多视图特征选择进行多组学数据集成的新框架。我们引入了一种新颖的多视图特征选择算法,MRMR-mv,这是一种对著名的最小冗余和最大相关性(MRMR)单视图特征选择算法到多视图设置的改编。
我们报告了使用来自 TCGA 数据库的卵巢癌多组学数据集在预测卵巢癌生存任务上的实验结果。我们的结果表明,多视图模型优于单视图模型(即使用单一类型的组学数据进行训练和测试的模型)和基于两种基线数据融合方法的模型。
我们的结果表明,多视图特征选择在多组学数据的综合分析和预测建模中具有潜力。