Suppr超能文献

MIDAA:基于生物学原理的可解释多组学数据整合深度原型分析

MIDAA: deep archetypal analysis for interpretable multi-omic data integration based on biological principles.

作者信息

Milite Salvatore, Caravagna Giulio, Sottoriva Andrea

机构信息

Computational Biology Research Centre, Human Technopole, Milan, Italy.

Department of Mathematics, Informatics and Geosciences, University of Trieste, Trieste, Italy.

出版信息

Genome Biol. 2025 Apr 8;26(1):90. doi: 10.1186/s13059-025-03530-9.

Abstract

High-throughput multi-omic molecular profiling allows the probing of biological systems at unprecedented resolution. However, integrating and interpreting high-dimensional, sparse, and noisy multimodal datasets remains challenging. Deriving new biological insights with current methods is difficult because they are not rooted in biological principles but prioritise tasks like dimensionality reduction. Here, we introduce a framework that combines archetypal analysis, an approach grounded in biological principles, with deep learning. Using archetypes based on evolutionary trade-offs and Pareto optimality, MIDAA finds extreme data points that define the geometry of the latent space, preserving the complexity of biological interactions while retaining an interpretable output. We demonstrate that these extreme points represent cellular programmes reflecting the underlying biology. Moreover, we show that, compared to alternative methods, MIDAA can identify parsimonious, interpretable, and biologically relevant patterns from real and simulated multi-omics.

摘要

高通量多组学分子谱分析能够以前所未有的分辨率探究生物系统。然而,整合和解释高维、稀疏且有噪声的多模态数据集仍然具有挑战性。用当前方法获取新的生物学见解很困难,因为这些方法并非基于生物学原理,而是优先考虑降维等任务。在此,我们引入了一个框架,该框架将基于生物学原理的原型分析方法与深度学习相结合。MIDAA利用基于进化权衡和帕累托最优的原型,找到定义潜在空间几何形状的极端数据点,在保留可解释输出的同时,保留生物相互作用的复杂性。我们证明这些极端点代表反映基础生物学的细胞程序。此外,我们表明,与其他方法相比,MIDAA能够从真实和模拟的多组学数据中识别出简约、可解释且与生物学相关的模式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d177/11980162/f72137fcc8c1/13059_2025_3530_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验