Center for Computational Molecular Biology, Brown University, Providence, Rhode Island 02912, USA.
Department of Computer Science, Princeton University, Princeton, New Jersey 08540, USA.
Genome Res. 2020 Feb;30(2):195-204. doi: 10.1101/gr.251603.119. Epub 2020 Jan 28.
Single-cell RNA-sequencing (scRNA-seq) enables high-throughput measurement of RNA expression in single cells. However, because of technical limitations, scRNA-seq data often contain zero counts for many transcripts in individual cells. These zero counts, or dropout events, complicate the analysis of scRNA-seq data using standard methods developed for bulk RNA-seq data. Current scRNA-seq analysis methods typically overcome dropout by combining information across cells in a lower-dimensional space, leveraging the observation that cells generally occupy a small number of RNA expression states. We introduce netNMF-sc, an algorithm for scRNA-seq analysis that leverages information across both cells and genes. netNMF-sc learns a low-dimensional representation of scRNA-seq transcript counts using network-regularized non-negative matrix factorization. The network regularization takes advantage of prior knowledge of gene-gene interactions, encouraging pairs of genes with known interactions to be nearby each other in the low-dimensional representation. The resulting matrix factorization imputes gene abundance for both zero and nonzero counts and can be used to cluster cells into meaningful subpopulations. We show that netNMF-sc outperforms existing methods at clustering cells and estimating gene-gene covariance using both simulated and real scRNA-seq data, with increasing advantages at higher dropout rates (e.g., >60%). We also show that the results from netNMF-sc are robust to variation in the input network, with more representative networks leading to greater performance gains.
单细胞 RNA 测序(scRNA-seq)能够实现对单个细胞中 RNA 表达的高通量测量。然而,由于技术限制,scRNA-seq 数据在单个细胞中经常包含许多转录本的零计数。这些零计数或缺失事件使得使用为批量 RNA-seq 数据开发的标准方法来分析 scRNA-seq 数据变得复杂。当前的 scRNA-seq 分析方法通常通过在低维空间中跨细胞组合信息来克服缺失,这利用了这样一个观察结果,即细胞通常占据少量 RNA 表达状态。我们引入了 netNMF-sc,这是一种用于 scRNA-seq 分析的算法,它利用了细胞和基因之间的信息。netNMF-sc 使用网络正则化非负矩阵分解来学习 scRNA-seq 转录本计数的低维表示。网络正则化利用了基因-基因相互作用的先验知识,鼓励具有已知相互作用的基因对在低维表示中彼此相邻。由此产生的矩阵分解可以对零计数和非零计数的基因丰度进行推断,并可用于将细胞聚类成有意义的亚群。我们表明,netNMF-sc 在使用模拟和真实 scRNA-seq 数据进行细胞聚类和估计基因-基因协方差方面优于现有方法,在更高的缺失率(例如,>60%)下具有越来越大的优势。我们还表明,netNMF-sc 的结果对输入网络的变化具有鲁棒性,更具代表性的网络可带来更大的性能提升。