Fiorentino Giuseppe, Visintainer Roberto, Domenici Enrico, Lauria Mario, Marchetti Luca
Fondazione The Microsoft Research, University of Trento Centre for Computational and Systems Biology (COSBI), 38068 Rovereto, Italy.
Department of Cellular, Computational, and Integrative Biology (CiBio), University of Trento, 38123 Povo, Italy.
Cancers (Basel). 2021 Jul 8;13(14):3423. doi: 10.3390/cancers13143423.
High-throughput technologies make it possible to produce a large amount of data representing different biological layers, examples of which are genomics, proteomics, metabolomics and transcriptomics. Omics data have been individually investigated to understand the molecular bases of various diseases, but this may not be sufficient to fully capture the molecular mechanisms and the multilayer regulatory processes underlying complex diseases, especially cancer. To overcome this problem, several multi-omics integration methods have been introduced but a commonly agreed standard of analysis is still lacking. In this paper, we present MOUSSE, a novel normalization-free pipeline for unsupervised multi-omics integration. The main innovations are the use of rank-based subject-specific signatures and the use of such signatures to derive subject similarity networks. A separate similarity network was derived for each omics, and the resulting networks were then carefully merged in a way that considered their informative content. We applied it to analyze survival in ten different types of cancer. We produced a meaningful clusterization of the subjects and obtained a higher average classification score than ten state-of-the-art algorithms tested on the same data. As further validation, we extracted from the subject-specific signatures a list of relevant features used for the clusterization and investigated their biological role in survival. We were able to verify that, according to the literature, these features are highly involved in cancer progression and differential survival.
高通量技术使得生成大量代表不同生物层面的数据成为可能,其中包括基因组学、蛋白质组学、代谢组学和转录组学等。为了理解各种疾病的分子基础,人们对组学数据进行了单独研究,但这可能不足以充分捕捉复杂疾病(尤其是癌症)背后的分子机制和多层调控过程。为了克服这个问题,人们引入了几种多组学整合方法,但仍缺乏一个普遍认可的分析标准。在本文中,我们提出了MOUSSE,这是一种用于无监督多组学整合的新型无归一化流程。主要创新点在于使用基于秩的个体特异性特征以及利用这些特征推导个体相似性网络。针对每个组学都推导了一个单独的相似性网络,然后以一种考虑其信息内容的方式仔细合并所得网络。我们将其应用于分析十种不同类型癌症的生存率。我们对个体进行了有意义的聚类,并获得了比在相同数据上测试的十种先进算法更高的平均分类分数。作为进一步的验证,我们从个体特异性特征中提取了用于聚类的相关特征列表,并研究了它们在生存中的生物学作用。根据文献,我们能够验证这些特征与癌症进展和差异生存高度相关。