Planell Nuria, Lagani Vincenzo, Sebastian-Leon Patricia, van der Kloet Frans, Ewing Ewoud, Karathanasis Nestoras, Urdangarin Arantxa, Arozarena Imanol, Jagodic Maja, Tsamardinos Ioannis, Tarazona Sonia, Conesa Ana, Tegner Jesper, Gomez-Cabrero David
Translational Bioinformatics Unit, Navarrabiomed, Complejo Hospitalario de Navarra (CHN), Universidad Pública de Navarra (UPNA), IdiSNA, Pamplona, Spain.
Institute of Chemical Biology, Ilia State University, Tbilisi, Georgia.
Front Genet. 2021 Mar 4;12:620453. doi: 10.3389/fgene.2021.620453. eCollection 2021.
Technologies for profiling samples using different omics platforms have been at the forefront since the human genome project. Large-scale multi-omics data hold the promise of deciphering different regulatory layers. Yet, while there is a myriad of bioinformatics tools, each multi-omics analysis appears to start from scratch with an arbitrary decision over which tools to use and how to combine them. Therefore, it is an unmet need to conceptualize how to integrate such data and implement and validate pipelines in different cases. We have designed a conceptual framework (STATegra), aiming it to be as generic as possible for multi-omics analysis, combining available multi-omic anlaysis tools (machine learning component analysis, non-parametric data combination, and a multi-omics exploratory analysis) in a step-wise manner. While in several studies, we have previously combined those integrative tools, here, we provide a systematic description of the STATegra framework and its validation using two The Cancer Genome Atlas (TCGA) case studies. For both, the Glioblastoma and the Skin Cutaneous Melanoma (SKCM) cases, we demonstrate an enhanced capacity of the framework (and beyond the individual tools) to identify features and pathways compared to single-omics analysis. Such an integrative multi-omics analysis framework for identifying features and components facilitates the discovery of new biology. Finally, we provide several options for applying the STATegra framework when parametric assumptions are fulfilled and for the case when not all the samples are profiled for all omics. The STATegra framework is built using several tools, which are being integrated step-by-step as OpenSource in the STATega Bioconductor package.
自人类基因组计划以来,使用不同组学平台对样本进行分析的技术一直处于前沿。大规模多组学数据有望破解不同的调控层面。然而,尽管有大量的生物信息学工具,但每次多组学分析似乎都要从头开始,随意决定使用哪些工具以及如何将它们组合起来。因此,如何整合这些数据并在不同情况下实现和验证分析流程仍是一个未满足的需求。我们设计了一个概念框架(STATegra),旨在使其对多组学分析尽可能通用,以逐步方式结合可用的多组学分析工具(机器学习成分分析、非参数数据组合和多组学探索性分析)。虽然我们之前在几项研究中已经结合了这些整合工具,但在这里,我们提供了对STATegra框架的系统描述及其使用两个癌症基因组图谱(TCGA)案例研究进行的验证。对于胶质母细胞瘤和皮肤黑色素瘤(SKCM)这两个案例,我们都证明了与单组学分析相比,该框架(以及超越单个工具)在识别特征和通路方面具有更强的能力。这种用于识别特征和成分的整合多组学分析框架有助于发现新的生物学现象。最后,我们提供了几种在满足参数假设时以及并非所有样本都进行了所有组学分析的情况下应用STATegra框架的选项。STATegra框架是使用多个工具构建的,这些工具正作为开源逐步集成到STATega Bioconductor软件包中。