Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany.
TUM School of Life Sciences Weihenstephan, Technische Universität München, Freising, Germany.
Nat Commun. 2019 Jan 23;10(1):390. doi: 10.1038/s41467-018-07931-2.
Single-cell RNA sequencing (scRNA-seq) has enabled researchers to study gene expression at a cellular resolution. However, noise due to amplification and dropout may obstruct analyses, so scalable denoising methods for increasingly large but sparse scRNA-seq data are needed. We propose a deep count autoencoder network (DCA) to denoise scRNA-seq datasets. DCA takes the count distribution, overdispersion and sparsity of the data into account using a negative binomial noise model with or without zero-inflation, and nonlinear gene-gene dependencies are captured. Our method scales linearly with the number of cells and can, therefore, be applied to datasets of millions of cells. We demonstrate that DCA denoising improves a diverse set of typical scRNA-seq data analyses using simulated and real datasets. DCA outperforms existing methods for data imputation in quality and speed, enhancing biological discovery.
单细胞 RNA 测序 (scRNA-seq) 使研究人员能够以细胞分辨率研究基因表达。然而,由于扩增和缺失引起的噪声可能会阻碍分析,因此需要可扩展的去噪方法来处理越来越大但稀疏的 scRNA-seq 数据。我们提出了一种深度计数自动编码器网络 (DCA) 来对 scRNA-seq 数据集进行去噪。DCA 使用带有或不带有零膨胀的负二项式噪声模型考虑数据的计数分布、过分散和稀疏性,并捕获非线性的基因-基因相关性。我们的方法与细胞数量呈线性扩展,因此可以应用于数百万个细胞的数据集。我们证明 DCA 去噪可以改善使用模拟和真实数据集的各种典型 scRNA-seq 数据分析。DCA 在质量和速度方面优于现有数据插补方法,增强了生物学发现。