Suppr超能文献

线性和非线性联合嵌入方法在体和单细胞多组学中的深入比较。

An in-depth comparison of linear and non-linear joint embedding methods for bulk and single-cell multi-omics.

机构信息

Delft Bioinformatics Lab, Delft University of Technology, Street, Postcode, State, Country.

Department of Medical Oncology, Erasmus University Medical Center, Street, Postcode, State, Country.

出版信息

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad416.

Abstract

Multi-omic analyses are necessary to understand the complex biological processes taking place at the tissue and cell level, but also to make reliable predictions about, for example, disease outcome. Several linear methods exist that create a joint embedding using paired information per sample, but recently there has been a rise in the popularity of neural architectures that embed paired -omics into the same non-linear manifold. This work describes a head-to-head comparison of linear and non-linear joint embedding methods using both bulk and single-cell multi-modal datasets. We found that non-linear methods have a clear advantage with respect to linear ones for missing modality imputation. Performance comparisons in the downstream tasks of survival analysis for bulk tumor data and cell type classification for single-cell data lead to the following insights: First, concatenating the principal components of each modality is a competitive baseline and hard to beat if all modalities are available at test time. However, if we only have one modality available at test time, training a predictive model on the joint space of that modality can lead to performance improvements with respect to just using the unimodal principal components. Second, -omic profiles imputed by neural joint embedding methods are realistic enough to be used by a classifier trained on real data with limited performance drops. Taken together, our comparisons give hints to which joint embedding to use for which downstream task. Overall, product-of-experts performed well in most tasks and was reasonably fast, while early integration (concatenation) of modalities did quite poorly.

摘要

多组学分析对于理解组织和细胞水平上发生的复杂生物学过程是必要的,同时也可以对例如疾病结果进行可靠的预测。存在几种线性方法,这些方法可以使用每个样本的配对信息创建联合嵌入,但最近,将配对的组学嵌入到同一非线性流形中的神经架构的受欢迎程度有所上升。本研究使用批量和单细胞多模态数据集对头对头比较线性和非线性联合嵌入方法。我们发现,对于缺失模态插补,非线性方法相对于线性方法具有明显优势。在批量肿瘤数据的生存分析下游任务和单细胞数据的细胞类型分类的性能比较中,得出以下见解:首先,如果在测试时所有模态都可用,那么将每个模态的主成分串联起来是一种具有竞争力的基线,并且很难被击败。然而,如果我们在测试时只有一个模态可用,那么在该模态的联合空间上训练预测模型可以提高性能,而不仅仅是使用单模态主成分。其次,通过神经联合嵌入方法推断的组学谱足够真实,可以被有限性能下降的基于真实数据训练的分类器使用。总之,我们的比较为下游任务提供了使用哪种联合嵌入的提示。总体而言,专家乘积在大多数任务中表现良好,速度也相当快,而模态的早期集成(串联)表现得相当差。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1775/10685331/87d9f3fb6663/bbad416f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验