GAUDI：通过UMAP嵌入和基于密度的聚类实现可解释的多组学整合。

GAUDI: interpretable multi-omics integration with UMAP embeddings and density-based clustering.

作者信息

Castellano-Escuder Pol, Zachman Derek K, Han Kevin, Hirschey Matthey D

机构信息

Duke Molecular Physiology Institute, Duke University School of Medicine, Durham, NC, USA.

Duke Department of Pediatrics, Division of Hematology-Oncology, Duke University School of Medicine, Durham, NC, USA.

出版信息

Nat Commun. 2025 Jul 1;16(1):5771. doi: 10.1038/s41467-025-60822-1.

DOI:10.1038/s41467-025-60822-1

PMID:40593592

Abstract

Integrating high-dimensional cellular multi-omics data is crucial for understanding various layers of biological control. Single 'omic methods provide important insights, but often fall short in handling the complex relationships between genes, proteins, metabolites and beyond. Here, we present a novel, non-linear, and unsupervised method called GAUDI (Group Aggregation via UMAP Data Integration) that leverages independent UMAP embeddings for the concurrent analysis of multiple data types. GAUDI uncovers non-linear relationships among different omics data better than several state-of-the-art methods. This approach not only clusters samples by their multi-omic profiles but also identifies latent factors across each omics dataset, thereby enabling interpretation of the underlying features contributing to each cluster. Consequently, GAUDI facilitates more intuitive, interpretable visualizations to identify novel insights and potential biomarkers from a wide range of experimental designs.

摘要

整合高维细胞多组学数据对于理解生物控制的各个层面至关重要。单一的“组学”方法提供了重要的见解，但在处理基因、蛋白质、代谢物及其他方面之间的复杂关系时往往力不从心。在此，我们提出了一种新颖的、非线性的无监督方法，称为GAUDI（通过UMAP数据集成进行组聚合），该方法利用独立的UMAP嵌入来同时分析多种数据类型。与几种先进方法相比，GAUDI能更好地揭示不同组学数据之间的非线性关系。这种方法不仅根据样本的多组学概况对样本进行聚类，还能识别每个组学数据集中的潜在因素，从而能够解释促成每个聚类的潜在特征。因此，GAUDI有助于进行更直观、可解释的可视化，以便从广泛的实验设计中识别新的见解和潜在的生物标志物。