Kim Dokyoon, Shin Hyunjung, Sohn Kyung-Ah, Verma Anurag, Ritchie Marylyn D, Kim Ju Han
Seoul National University Biomedical Informatics (SNUBI), Div. of Biomedical Informatics, Seoul National University College of Medicine, Seoul 110799, Republic of Korea; Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA.
Department of Industrial & Information Systems Engineering, Ajou University, San 5, Wonchun-dong, Yeoungtong-gu, 443-749 Suwon, Republic of Korea.
Methods. 2014 Jun 1;67(3):344-53. doi: 10.1016/j.ymeth.2014.02.003. Epub 2014 Feb 18.
In order to improve our understanding of cancer and develop multi-layered theoretical models for the underlying mechanism, it is essential to have enhanced understanding of the interactions between multiple levels of genomic data that contribute to tumor formation and progression. Although there exist recent approaches such as a graph-based framework that integrates multi-omics data including copy number alteration, methylation, gene expression, and miRNA data for cancer clinical outcome prediction, most of previous methods treat each genomic data as independent and the possible interplay between them is not explicitly incorporated to the model. However, cancer is dysregulated by multiple levels in the biological system through genomic, epigenomic, transcriptomic, and proteomic level. Thus, genomic features are likely to interact with other genomic features in the different genomic levels. In order to deepen our knowledge, it would be desirable to incorporate such inter-relationship information when integrating multi-omics data for cancer clinical outcome prediction. In this study, we propose a new graph-based framework that integrates not only multi-omics data but inter-relationship between them for better elucidating cancer clinical outcomes. In order to highlight the validity of the proposed framework, serous cystadenocarcinoma data from TCGA was adopted as a pilot task. The proposed model incorporating inter-relationship between different genomic features showed significantly improved performance compared to the model that does not consider inter-relationship when integrating multi-omics data. For the pair between miRNA and gene expression data, the model integrating miRNA, for example, gene expression, and inter-relationship between them with an AUC of 0.8476 (REI) outperformed the model combining miRNA and gene expression data with an AUC of 0.8404. Similar results were also obtained for other pairs between different levels of genomic data. Integration of different levels of data and inter-relationship between them can aid in extracting new biological knowledge by drawing an integrative conclusion from many pieces of information collected from diverse types of genomic data, eventually leading to more effective screening strategies and alternative therapies that may improve outcomes.
为了增进我们对癌症的理解并开发关于潜在机制的多层次理论模型,必须更深入地了解促成肿瘤形成和进展的多个基因组数据层面之间的相互作用。尽管最近有一些方法,如基于图的框架,用于整合包括拷贝数改变、甲基化、基因表达和miRNA数据等多组学数据以预测癌症临床结果,但大多数先前的方法将每个基因组数据视为独立的,并未在模型中明确纳入它们之间可能的相互作用。然而,癌症在生物系统中通过基因组、表观基因组、转录组和蛋白质组水平受到多个层面的失调影响。因此,基因组特征可能在不同的基因组层面与其他基因组特征相互作用。为了深化我们的认识,在整合多组学数据用于癌症临床结果预测时纳入这种相互关系信息将是很有必要的。在本研究中,我们提出了一种新的基于图的框架,该框架不仅整合多组学数据,还整合它们之间的相互关系,以便更好地阐明癌症临床结果。为了突出所提出框架的有效性,采用了来自TCGA的浆液性囊腺癌数据作为试点任务。与在整合多组学数据时不考虑相互关系的模型相比,纳入不同基因组特征之间相互关系的所提出模型表现出显著改善的性能。例如,对于miRNA和基因表达数据对,整合miRNA、基因表达及其相互关系的模型,其AUC为0.8476(REI),优于结合miRNA和基因表达数据且AUC为0.8404的模型。在不同层面的基因组数据之间的其他数据对中也获得了类似结果。整合不同层面的数据及其相互关系有助于通过从从不同类型的基因组数据收集的许多信息中得出综合结论来提取新的生物学知识,最终导致更有效的筛查策略和可能改善结果的替代疗法。