基于相似性辅助变分自动编码器的非线性降维方法及其在单细胞 RNA 测序数据分析中的应用。

Similarity-assisted variational autoencoder for nonlinear dimension reduction with application to single-cell RNA sequencing data.

机构信息

Graduate School of Data Science, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea.

Department of Mathematical Sciences, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea.

出版信息

BMC Bioinformatics. 2023 Nov 14;24(1):432. doi: 10.1186/s12859-023-05552-1.

DOI:10.1186/s12859-023-05552-1

PMID:37964243

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10647110/

Abstract

BACKGROUND

Deep generative models naturally become nonlinear dimension reduction tools to visualize large-scale datasets such as single-cell RNA sequencing datasets for revealing latent grouping patterns or identifying outliers. The variational autoencoder (VAE) is a popular deep generative method equipped with encoder/decoder structures. The encoder and decoder are useful when a new sample is mapped to the latent space and a data point is generated from a point in a latent space. However, the VAE tends not to show grouping pattern clearly without additional annotation information. On the other hand, similarity-based dimension reduction methods such as t-SNE or UMAP present clear grouping patterns even though these methods do not have encoder/decoder structures.

RESULTS

To bridge this gap, we propose a new approach that adopts similarity information in the VAE framework. In addition, for biological applications, we extend our approach to a conditional VAE to account for covariate effects in the dimension reduction step. In the simulation study and real single-cell RNA sequencing data analyses, our method shows great performance compared to existing state-of-the-art methods by producing clear grouping structures using an inferred encoder and decoder. Our method also successfully adjusts for covariate effects, resulting in more useful dimension reduction.

CONCLUSIONS

Our method is able to produce clearer grouping patterns than those of other regularized VAE methods by utilizing similarity information encoded in the data via the highly celebrated UMAP loss function.

摘要

背景

深度生成模型自然成为可视化大规模数据集（如单细胞 RNA 测序数据集）的非线性降维工具，以揭示潜在的分组模式或识别异常值。变分自动编码器（VAE）是一种流行的深度生成方法，配备了编码器/解码器结构。当将新样本映射到潜在空间并从潜在空间中的一个点生成数据点时，编码器和解码器非常有用。然而，VAE 往往没有额外的注释信息，无法清晰地显示分组模式。另一方面，基于相似性的降维方法（如 t-SNE 或 UMAP）即使没有编码器/解码器结构，也能呈现出清晰的分组模式。

结果

为了弥合这一差距，我们提出了一种新方法，该方法采用了 VAE 框架中的相似性信息。此外，为了生物应用，我们将我们的方法扩展到条件 VAE 中，以在降维步骤中考虑协变量的影响。在模拟研究和真实的单细胞 RNA 测序数据分析中，与现有的最先进的方法相比，我们的方法通过使用推断的编码器和解码器生成清晰的分组结构，表现出了出色的性能。我们的方法还成功地调整了协变量的影响，从而实现了更有用的降维。

结论

我们的方法通过利用高度著名的 UMAP 损失函数对数据中编码的相似性信息，能够产生比其他正则化 VAE 方法更清晰的分组模式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a869/10647110/c3a7783d4a50/12859_2023_5552_Fig1_HTML.jpg

相似文献

BMC Bioinformatics. 2023 Nov 14;24(1):432. doi: 10.1186/s12859-023-05552-1.

Parameter tuning is a key part of dimensionality reduction via deep variational autoencoders for single cell RNA transcriptomics.参数调整是通过深度变分自编码器进行单细胞RNA转录组学降维的关键部分。

Pac Symp Biocomput. 2019;24:362-373.

Deep Nonnegative Matrix Factorization Using a Variational Autoencoder With Application to Single-Cell RNA Sequencing Data.使用变分自编码器的深度非负矩阵分解及其在单细胞RNA测序数据中的应用

IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):883-893. doi: 10.1109/TCBB.2022.3172723. Epub 2023 Apr 3.

Searching for protein variants with desired properties using deep generative models.使用深度生成模型搜索具有所需特性的蛋白质变体。

BMC Bioinformatics. 2023 Jul 21;24(1):297. doi: 10.1186/s12859-023-05415-9.

Probabilistic Autoencoder Using Fisher Information.使用费希尔信息的概率自动编码器。

Entropy (Basel). 2021 Dec 6;23(12):1640. doi: 10.3390/e23121640.

Clustering Analysis via Deep Generative Models With Mixture Models.基于混合模型的深度生成模型的聚类分析

IEEE Trans Neural Netw Learn Syst. 2022 Jan;33(1):340-350. doi: 10.1109/TNNLS.2020.3027761. Epub 2022 Jan 5.

Deep Clustering Analysis via Dual Variational Autoencoder With Spherical Latent Embeddings.基于具有球形潜在嵌入的对偶变分自编码器的深度聚类分析

IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):6303-6312. doi: 10.1109/TNNLS.2021.3135460. Epub 2023 Sep 1.

A 3D lung lesion variational autoencoder.三维肺病变变分自编码器。

Cell Rep Methods. 2024 Feb 26;4(2):100695. doi: 10.1016/j.crmeth.2024.100695. Epub 2024 Jan 25.

VASC: Dimension Reduction and Visualization of Single-cell RNA-seq Data by Deep Variational Autoencoder.VASC：基于深度变分自动编码器的单细胞 RNA-seq 数据降维和可视化。

Genomics Proteomics Bioinformatics. 2018 Oct;16(5):320-331. doi: 10.1016/j.gpb.2018.08.003. Epub 2018 Dec 18.

Variational Manifold Learning From Incomplete Data: Application to Multislice Dynamic MRI.变分流形学习从不完全数据：在多层面动态 MRI 的应用。

IEEE Trans Med Imaging. 2022 Dec;41(12):3552-3561. doi: 10.1109/TMI.2022.3189905. Epub 2022 Dec 2.

引用本文的文献

Deep learning in single-cell and spatial transcriptomics data analysis: advances and challenges from a data science perspective.从数据科学视角看深度学习在单细胞和空间转录组学数据分析中的进展与挑战

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf136.

iNP_ESM: Neuropeptide Identification Based on Evolutionary Scale Modeling and Unified Representation Embedding Features.iNP_ESM：基于进化尺度建模和统一表示嵌入特征的神经肽识别。

Int J Mol Sci. 2024 Jun 27;25(13):7049. doi: 10.3390/ijms25137049.

本文引用的文献

bmVAE: a variational autoencoder method for clustering single-cell mutation data.基于变分自编码器的单细胞突变聚类方法。

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac790.

A Python library for probabilistic analysis of single-cell omics data.一个用于单细胞组学数据概率分析的Python库。

Nat Biotechnol. 2022 Feb;40(2):163-166. doi: 10.1038/s41587-021-01206-w.

The application of Uniform Manifold Approximation and Projection (UMAP) for unconstrained ordination and classification of biological indicators in aquatic ecology.统一流形逼近和投影（UMAP）在水生生态学中生物指标的无约束排序和分类中的应用。

Sci Total Environ. 2022 Apr 1;815:152365. doi: 10.1016/j.scitotenv.2021.152365. Epub 2021 Dec 25.

Deep learning tackles single-cell analysis-a survey of deep learning for scRNA-seq analysis.深度学习应对单细胞分析——深度学习在 scRNA-seq 分析中的应用综述。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab531.

Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models.深度生成模型：VAE、GAN、归一化流、基于能量和自回归模型的比较综述。

IEEE Trans Pattern Anal Mach Intell. 2022 Nov;44(11):7327-7347. doi: 10.1109/TPAMI.2021.3116668. Epub 2022 Oct 4.

Application of Uniform Manifold Approximation and Projection (UMAP) in spectral imaging of artworks.统一流形逼近与投影（UMAP）在艺术品光谱成像中的应用。

Spectrochim Acta A Mol Biomol Spectrosc. 2021 May 5;252:119547. doi: 10.1016/j.saa.2021.119547. Epub 2021 Feb 4.

Cells of the adult human heart.成人心脏细胞。

Nature. 2020 Dec;588(7838):466-472. doi: 10.1038/s41586-020-2797-4. Epub 2020 Sep 24.

netAE: semi-supervised dimensionality reduction of single-cell RNA sequencing to facilitate cell labeling.netAE：单细胞 RNA 测序的半监督降维以促进细胞标记。

Bioinformatics. 2021 Apr 9;37(1):43-49. doi: 10.1093/bioinformatics/btaa669.

scVAE: variational auto-encoders for single-cell gene expression data.scVAE：用于单细胞基因表达数据的变分自动编码器。

Bioinformatics. 2020 Aug 15;36(16):4415-4422. doi: 10.1093/bioinformatics/btaa293.

Dimensionality reduction by UMAP to visualize physical and genetic interactions.UMAP 通过降维可视化物理和遗传相互作用。

Nat Commun. 2020 Mar 24;11(1):1537. doi: 10.1038/s41467-020-15351-4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于相似性辅助变分自动编码器的非线性降维方法及其在单细胞 RNA 测序数据分析中的应用。

Similarity-assisted variational autoencoder for nonlinear dimension reduction with application to single-cell RNA sequencing data.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献