Jung Inuk, Kim Minsu, Rhee Sungmin, Lim Sangsoo, Kim Sun
Department of Computer Science and Engineering, Kyungpook National University, Daegu, South Korea.
Computing and Computational Sciences Directorate, Oak Ridge National Laboratory, Oak Ridge, TN, United States.
Front Genet. 2021 Sep 10;12:682841. doi: 10.3389/fgene.2021.682841. eCollection 2021.
Multi-omics data is frequently measured to enrich the comprehension of biological mechanisms underlying certain phenotypes. However, due to the complex relations and high dimension of multi-omics data, it is difficult to associate omics features to certain biological traits of interest. For example, the clinically valuable breast cancer subtypes are well-defined at the molecular level, but are poorly classified using gene expression data. Here, we propose a multi-omics analysis method called MONTI (Multi-Omics Non-negative Tensor decomposition for Integrative analysis), which goal is to select multi-omics features that are able to represent trait specific characteristics. Here, we demonstrate the strength of multi-omics integrated analysis in terms of cancer subtyping. The multi-omics data are first integrated in a biologically meaningful manner to form a three dimensional tensor, which is then decomposed using a non-negative tensor decomposition method. From the result, MONTI selects highly informative subtype specific multi-omics features. MONTI was applied to three case studies of 597 breast cancer, 314 colon cancer, and 305 stomach cancer cohorts. For all the case studies, we found that the subtype classification accuracy significantly improved when utilizing all available multi-omics data. MONTI was able to detect subtype specific gene sets that showed to be strongly regulated by certain omics, from which correlation between omics types could be inferred. Furthermore, various clinical attributes of nine cancer types were analyzed using MONTI, which showed that some clinical attributes could be well explained using multi-omics data. We demonstrated that integrating multi-omics data in a gene centric manner improves detecting cancer subtype specific features and other clinical features, which may be used to further understand the molecular characteristics of interest. The software and data used in this study are available at: https://github.com/inukj/MONTI.
多组学数据经常被测量,以深化对某些表型背后生物学机制的理解。然而,由于多组学数据关系复杂且维度高,很难将组学特征与感兴趣的特定生物学性状联系起来。例如,临床上有价值的乳腺癌亚型在分子水平上定义明确,但使用基因表达数据进行分类的效果不佳。在此,我们提出一种名为MONTI(用于综合分析的多组学非负张量分解)的多组学分析方法,其目标是选择能够代表性状特定特征的多组学特征。在此,我们展示了多组学综合分析在癌症亚型分类方面的优势。多组学数据首先以生物学上有意义的方式整合,形成一个三维张量,然后使用非负张量分解方法进行分解。从结果中,MONTI选择了具有高度信息性的亚型特异性多组学特征。MONTI应用于597例乳腺癌、314例结肠癌和305例胃癌队列的三个案例研究。对于所有案例研究,我们发现利用所有可用的多组学数据时,亚型分类准确性显著提高。MONTI能够检测到显示受某些组学强烈调控的亚型特异性基因集,由此可以推断组学类型之间的相关性。此外,使用MONTI分析了九种癌症类型的各种临床属性,结果表明一些临床属性可以用多组学数据很好地解释。我们证明,以基因为中心整合多组学数据可改善对癌症亚型特异性特征和其他临床特征的检测,这些特征可用于进一步了解感兴趣的分子特征。本研究中使用的软件和数据可在以下网址获取:https://github.com/inukj/MONTI。