Computational Systems Biology Team, Institut de Biologie de l'Ecole Normale Supérieure, CNRS, INSERM, Ecole Normale Supérieure, Université PSL, 75005, Paris, France.
Aix Marseille Univ, INSERM, MMG, Marseille Medical Genetics, CNRS, Turing Center for Living Systems, Marseille, France.
Nat Commun. 2021 Jan 5;12(1):124. doi: 10.1038/s41467-020-20430-7.
High-dimensional multi-omics data are now standard in biology. They can greatly enhance our understanding of biological systems when effectively integrated. To achieve proper integration, joint Dimensionality Reduction (jDR) methods are among the most efficient approaches. However, several jDR methods are available, urging the need for a comprehensive benchmark with practical guidelines. We perform a systematic evaluation of nine representative jDR methods using three complementary benchmarks. First, we evaluate their performances in retrieving ground-truth sample clustering from simulated multi-omics datasets. Second, we use TCGA cancer data to assess their strengths in predicting survival, clinical annotations and known pathways/biological processes. Finally, we assess their classification of multi-omics single-cell data. From these in-depth comparisons, we observe that intNMF performs best in clustering, while MCIA offers an effective behavior across many contexts. The code developed for this benchmark study is implemented in a Jupyter notebook-multi-omics mix (momix)-to foster reproducibility, and support users and future developers.
现在,高维多组学数据在生物学中已经很常见。当这些数据被有效地整合时,它们可以极大地增强我们对生物系统的理解。为了实现适当的整合,联合降维(jDR)方法是最有效的方法之一。然而,有几种 jDR 方法可用,因此需要一个具有实用指南的综合基准来进行评估。我们使用三个补充基准对九种有代表性的 jDR 方法进行了系统评估。首先,我们评估它们从模拟多组学数据集中检索真实样本聚类的性能。其次,我们使用 TCGA 癌症数据评估它们在预测生存、临床注释和已知途径/生物学过程方面的优势。最后,我们评估它们对多组学单细胞数据的分类能力。通过这些深入比较,我们观察到 intNMF 在聚类方面表现最好,而 MCIA 在许多情况下都提供了有效的行为。为了促进可重复性,本基准研究中开发的代码以 Jupyter 笔记本-多组学混合(momix)的形式实现,并为用户和未来的开发者提供支持。