Suppr超能文献

基于相似性辅助变分自动编码器的非线性降维方法及其在单细胞 RNA 测序数据分析中的应用。

Similarity-assisted variational autoencoder for nonlinear dimension reduction with application to single-cell RNA sequencing data.

机构信息

Graduate School of Data Science, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea.

Department of Mathematical Sciences, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea.

出版信息

BMC Bioinformatics. 2023 Nov 14;24(1):432. doi: 10.1186/s12859-023-05552-1.

Abstract

BACKGROUND

Deep generative models naturally become nonlinear dimension reduction tools to visualize large-scale datasets such as single-cell RNA sequencing datasets for revealing latent grouping patterns or identifying outliers. The variational autoencoder (VAE) is a popular deep generative method equipped with encoder/decoder structures. The encoder and decoder are useful when a new sample is mapped to the latent space and a data point is generated from a point in a latent space. However, the VAE tends not to show grouping pattern clearly without additional annotation information. On the other hand, similarity-based dimension reduction methods such as t-SNE or UMAP present clear grouping patterns even though these methods do not have encoder/decoder structures.

RESULTS

To bridge this gap, we propose a new approach that adopts similarity information in the VAE framework. In addition, for biological applications, we extend our approach to a conditional VAE to account for covariate effects in the dimension reduction step. In the simulation study and real single-cell RNA sequencing data analyses, our method shows great performance compared to existing state-of-the-art methods by producing clear grouping structures using an inferred encoder and decoder. Our method also successfully adjusts for covariate effects, resulting in more useful dimension reduction.

CONCLUSIONS

Our method is able to produce clearer grouping patterns than those of other regularized VAE methods by utilizing similarity information encoded in the data via the highly celebrated UMAP loss function.

摘要

背景

深度生成模型自然成为可视化大规模数据集(如单细胞 RNA 测序数据集)的非线性降维工具,以揭示潜在的分组模式或识别异常值。变分自动编码器(VAE)是一种流行的深度生成方法,配备了编码器/解码器结构。当将新样本映射到潜在空间并从潜在空间中的一个点生成数据点时,编码器和解码器非常有用。然而,VAE 往往没有额外的注释信息,无法清晰地显示分组模式。另一方面,基于相似性的降维方法(如 t-SNE 或 UMAP)即使没有编码器/解码器结构,也能呈现出清晰的分组模式。

结果

为了弥合这一差距,我们提出了一种新方法,该方法采用了 VAE 框架中的相似性信息。此外,为了生物应用,我们将我们的方法扩展到条件 VAE 中,以在降维步骤中考虑协变量的影响。在模拟研究和真实的单细胞 RNA 测序数据分析中,与现有的最先进的方法相比,我们的方法通过使用推断的编码器和解码器生成清晰的分组结构,表现出了出色的性能。我们的方法还成功地调整了协变量的影响,从而实现了更有用的降维。

结论

我们的方法通过利用高度著名的 UMAP 损失函数对数据中编码的相似性信息,能够产生比其他正则化 VAE 方法更清晰的分组模式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a869/10647110/c3a7783d4a50/12859_2023_5552_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验