深度生成解码器：表示的 MAP 估计可改进单细胞 RNA 数据的建模。

The Deep Generative Decoder: MAP estimation of representations improves modelling of single-cell RNA data.

机构信息

Center for Health Data Science, University of Copenhagen, 2200 Copenhagen, Denmark.

Department of Computer Science, University of Copenhagen, 2100 Copenhagen, Denmark.

出版信息

Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad497.

DOI:10.1093/bioinformatics/btad497

PMID:37572301

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10483129/

Abstract

MOTIVATION

Learning low-dimensional representations of single-cell transcriptomics has become instrumental to its downstream analysis. The state of the art is currently represented by neural network models, such as variational autoencoders, which use a variational approximation of the likelihood for inference.

RESULTS

We here present the Deep Generative Decoder (DGD), a simple generative model that computes model parameters and representations directly via maximum a posteriori estimation. The DGD handles complex parameterized latent distributions naturally unlike variational autoencoders, which typically use a fixed Gaussian distribution, because of the complexity of adding other types. We first show its general functionality on a commonly used benchmark set, Fashion-MNIST. Secondly, we apply the model to multiple single-cell datasets. Here, the DGD learns low-dimensional, meaningful, and well-structured latent representations with sub-clustering beyond the provided labels. The advantages of this approach are its simplicity and its capability to provide representations of much smaller dimensionality than a comparable variational autoencoder.

AVAILABILITY AND IMPLEMENTATION

scDGD is available as a python package at https://github.com/Center-for-Health-Data-Science/scDGD. The remaining code is made available here: https://github.com/Center-for-Health-Data-Science/dgd.

摘要

动机

学习单细胞转录组学的低维表示已经成为其下游分析的重要手段。目前的技术水平代表是神经网络模型，如变分自动编码器，它使用似然的变分逼近进行推理。

结果

我们在这里提出了深度生成解码器（DGD），这是一种简单的生成模型，它通过最大后验估计直接计算模型参数和表示。与变分自动编码器不同，DGD 可以处理复杂的参数化潜在分布，因为添加其他类型的分布通常很复杂。我们首先在常用的基准集 Fashion-MNIST 上展示了它的一般功能。其次，我们将模型应用于多个单细胞数据集。在这里，DGD 学习到了低维的、有意义的、结构良好的潜在表示，并在提供的标签之外进行了子聚类。这种方法的优点是它的简单性，以及它能够提供比可比变分自动编码器小得多的维度的表示。