用于去噪单细胞RNA测序数据的深度神经网络。

A deep neural network to de-noise single-cell RNA sequencing data.

作者信息

Sharifitabar Mohsen, Kazempour Shiva, Razavian Javad, Sajedi Sogand, Solhjoo Soroosh, Zare Habil

出版信息

bioRxiv. 2024 Nov 21:2024.11.20.624552. doi: 10.1101/2024.11.20.624552.

DOI:10.1101/2024.11.20.624552

PMID:39605470

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11601639/

Abstract

Single-cell RNA sequencing (scRNA-seq), a powerful technique for investigating the transcriptome of individual cells, enables the discovery of heterogeneous cell populations, rare cell types, and transcriptional dynamics in separate cells. Yet, scRNA-seq data analysis is limited by the problem of measurement dropouts, i.e., genes displaying zero expression levels. We introduce ZiPo, a deep artificial neural network for rate estimation and library size prediction in scRNA-seq data which incorporates adjustable zero inflation in the distribution to capture the dropouts. ZiPo builds upon established concepts, including using deep autoencoders and adopting the Poisson and negative binomial distributions, by taking advantage of novel strategies, including library size prediction and residual connections, to improve the overall performance. A significant innovation of ZiPo is the introduction of a scale-invariant loss term, making the weights sparse and, hence, the model biologically more interpretable. ZiPo quickly handles vast singular and mixed datasets, with the processing time directly proportional to the number of cells. In this paper, we demonstrate the power of ZiPo on three datasets and show its advantages over other current techniques. The code used to produce the results in this manuscript is available at https://bitbucket.org/habilzare/alzheimer/src/master/code/deep/ZiPo/.

摘要

单细胞RNA测序（scRNA-seq）是一种用于研究单个细胞转录组的强大技术，能够发现异质细胞群体、稀有细胞类型以及单个细胞中的转录动态。然而，scRNA-seq数据分析受到测量缺失问题的限制，即基因显示零表达水平的情况。我们引入了ZiPo，这是一种用于scRNA-seq数据速率估计和文库大小预测的深度人工神经网络，它在分布中纳入了可调整的零膨胀以捕获缺失值。ZiPo建立在已有的概念之上，包括使用深度自编码器以及采用泊松分布和负二项分布，同时利用了新的策略，如图书库大小预测和残差连接，以提高整体性能。ZiPo的一项重大创新是引入了一个尺度不变损失项，使权重变得稀疏，从而使模型在生物学上更具可解释性。ZiPo能够快速处理大量的奇异和混合数据集，处理时间与细胞数量成正比。在本文中，我们展示了ZiPo在三个数据集上的强大功能，并展示了它相对于其他现有技术的优势。本手稿中用于产生结果的代码可在https://bitbucket.org/habilzare/alzheimer/src/master/code/deep/ZiPo/获取。