Suppr超能文献

基于模型的自动编码器用于推断离散的单细胞 RNA-seq 数据。

Model-based autoencoders for imputing discrete single-cell RNA-seq data.

机构信息

Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, United States.

NEC Laboratories America, Princeton, NJ 08540, United States.

出版信息

Methods. 2021 Aug;192:112-119. doi: 10.1016/j.ymeth.2020.09.010. Epub 2020 Sep 22.

Abstract

Deep neural networks have been widely applied for missing data imputation. However, most existing studies have been focused on imputing continuous data, while discrete data imputation is under-explored. Discrete data is common in real world, especially in research areas of bioinformatics, genetics, and biochemistry. In particular, large amounts of recent genomic data are discrete count data generated from single-cell RNA sequencing (scRNA-seq) technology. Most scRNA-seq studies produce a discrete matrix with prevailing 'false' zero count observations (missing values). To make downstream analyses more effective, imputation, which recovers the missing values, is often conducted as the first step in pre-processing scRNA-seq data. In this paper, we propose a novel Zero-Inflated Negative Binomial (ZINB) model-based autoencoder for imputing discrete scRNA-seq data. The novelties of our method are twofold. First, in addition to optimizing the ZINB likelihood, we propose to explicitly model the dropout events that cause missing values by using the Gumbel-Softmax distribution. Second, the zero-inflated reconstruction is further optimized with respect to the raw count matrix. Extensive experiments on simulation datasets demonstrate that the zero-inflated reconstruction significantly improves imputation accuracy. Real data experiments show that the proposed imputation can enhance separating different cell types and improve the accuracy of differential expression analysis.

摘要

深度神经网络已被广泛应用于缺失数据插补。然而,大多数现有研究都集中在连续数据的插补上,而离散数据的插补则研究不足。离散数据在现实世界中很常见,特别是在生物信息学、遗传学和生物化学等研究领域。特别是,大量最近的基因组数据是从单细胞 RNA 测序 (scRNA-seq) 技术生成的离散计数数据。大多数 scRNA-seq 研究产生的离散矩阵普遍存在“虚假”零计数观测值(缺失值)。为了使下游分析更有效,插补(恢复缺失值)通常作为预处理 scRNA-seq 数据的第一步进行。在本文中,我们提出了一种基于零膨胀负二项 (ZINB) 模型的自动编码器,用于插补离散 scRNA-seq 数据。我们方法的新颖之处有两点。首先,除了优化 ZINB 似然度外,我们还建议通过使用 Gumbel-Softmax 分布来显式建模导致缺失值的丢弃事件。其次,对原始计数矩阵进一步优化了零膨胀重建。在模拟数据集上的广泛实验表明,零膨胀重建显著提高了插补准确性。真实数据实验表明,所提出的插补可以增强不同细胞类型的分离,并提高差异表达分析的准确性。

相似文献

1
Model-based autoencoders for imputing discrete single-cell RNA-seq data.
Methods. 2021 Aug;192:112-119. doi: 10.1016/j.ymeth.2020.09.010. Epub 2020 Sep 22.
2
scGCL: an imputation method for scRNA-seq data based on graph contrastive learning.
Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad098.
5
SAE-Impute: imputation for single-cell data via subspace regression and auto-encoders.
BMC Bioinformatics. 2024 Oct 1;25(1):317. doi: 10.1186/s12859-024-05944-x.
8
scGGAN: single-cell RNA-seq imputation by graph-based generative adversarial network.
Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad040.
9
Sparsity-Penalized Stacked Denoising Autoencoders for Imputing Single-Cell RNA-Seq Data.
Genes (Basel). 2020 May 11;11(5):532. doi: 10.3390/genes11050532.

引用本文的文献

3
A deep learning adversarial autoencoder with dynamic batching displays high performance in denoising and ordering scRNA-seq data.
iScience. 2024 Jan 30;27(3):109027. doi: 10.1016/j.isci.2024.109027. eCollection 2024 Mar 15.
4
Imputation method for single-cell RNA-seq data using neural topic model.
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad098. Epub 2023 Nov 24.
5
Application of Deep Learning on Single-cell RNA Sequencing Data Analysis: A Review.
Genomics Proteomics Bioinformatics. 2022 Oct;20(5):814-835. doi: 10.1016/j.gpb.2022.11.011. Epub 2022 Dec 14.

本文引用的文献

1
Single-cell RNA-seq denoising using a deep count autoencoder.
Nat Commun. 2019 Jan 23;10(1):390. doi: 10.1038/s41467-018-07931-2.
2
Challenges in unsupervised clustering of single-cell RNA-seq data.
Nat Rev Genet. 2019 May;20(5):273-282. doi: 10.1038/s41576-018-0088-9.
3
VASC: Dimension Reduction and Visualization of Single-cell RNA-seq Data by Deep Variational Autoencoder.
Genomics Proteomics Bioinformatics. 2018 Oct;16(5):320-331. doi: 10.1016/j.gpb.2018.08.003. Epub 2018 Dec 18.
4
Deep generative modeling for single-cell transcriptomics.
Nat Methods. 2018 Dec;15(12):1053-1058. doi: 10.1038/s41592-018-0229-2. Epub 2018 Nov 30.
5
Recovering Gene Interactions from Single-Cell Data Using Data Diffusion.
Cell. 2018 Jul 26;174(3):716-729.e27. doi: 10.1016/j.cell.2018.05.061. Epub 2018 Jun 28.
6
SAVER: gene expression recovery for single-cell RNA sequencing.
Nat Methods. 2018 Jul;15(7):539-542. doi: 10.1038/s41592-018-0033-z. Epub 2018 Jun 25.
7
Integrating single-cell transcriptomic data across different conditions, technologies, and species.
Nat Biotechnol. 2018 Jun;36(5):411-420. doi: 10.1038/nbt.4096. Epub 2018 Apr 2.
8
An accurate and robust imputation method scImpute for single-cell RNA-seq data.
Nat Commun. 2018 Mar 8;9(1):997. doi: 10.1038/s41467-018-03405-7.
9
SCANPY: large-scale single-cell gene expression data analysis.
Genome Biol. 2018 Feb 6;19(1):15. doi: 10.1186/s13059-017-1382-0.
10
A general and flexible method for signal extraction from single-cell RNA-seq data.
Nat Commun. 2018 Jan 18;9(1):284. doi: 10.1038/s41467-017-02554-5.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验