Department of Computer Science, University of Texas at San Antonio, San Antonio, TX 78249, USA.
Department of Molecular Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA.
Genes (Basel). 2020 Mar 31;11(4):377. doi: 10.3390/genes11040377.
Single-cell RNA sequencing is a powerful technology for obtaining transcriptomes at single-cell resolutions. However, it suffers from dropout events (i.e., excess zero counts) since only a small fraction of transcripts get sequenced in each cell during the sequencing process. This inherent sparsity of expression profiles hinders further characterizations at cell/gene-level such as cell type identification and downstream analysis. To alleviate this dropout issue we introduce a network-based method, netImpute, by leveraging the hidden information in gene co-expression networks to recover real signals. netImpute employs Random Walk with Restart (RWR) to adjust the gene expression level in a given cell by borrowing information from its neighbors in a gene co-expression network. Performance evaluation and comparison with existing tools on simulated data and seven real datasets show that netImpute substantially enhances clustering accuracy and data visualization clarity, thanks to its effective treatment of dropouts. While the idea of netImpute is general and can be applied with other types of networks such as cell co-expression network or protein-protein interaction (PPI) network, evaluation results show that gene co-expression network is consistently more beneficial, presumably because PPI network usually lacks cell type context, while cell co-expression network can cause information loss for rare cell types. Evaluation results on several biological datasets show that netImpute can more effectively recover missing transcripts in scRNA-seq data and enhance the identification and visualization of heterogeneous cell types than existing methods.
单细胞 RNA 测序是一种强大的技术,可在单细胞分辨率下获得转录组。然而,由于在测序过程中每个细胞中只有一小部分转录本被测序,因此它会受到缺失事件(即过多的零计数)的影响。这种表达谱固有的稀疏性阻碍了在细胞/基因水平上的进一步特征描述,例如细胞类型鉴定和下游分析。为了缓解这个缺失问题,我们引入了一种基于网络的方法 netImpute,通过利用基因共表达网络中的隐藏信息来恢复真实信号。netImpute 通过随机游走重新启动(RWR)来调整给定细胞中的基因表达水平,通过从基因共表达网络中的邻居那里借用信息来实现。在模拟数据和七个真实数据集上的性能评估和与现有工具的比较表明,netImpute 由于有效处理缺失值,大大提高了聚类准确性和数据可视化清晰度。虽然 netImpute 的思想是通用的,可以应用于其他类型的网络,如细胞共表达网络或蛋白质-蛋白质相互作用(PPI)网络,但评估结果表明,基因共表达网络始终更有益,可能是因为 PPI 网络通常缺乏细胞类型背景,而细胞共表达网络可能会导致稀有细胞类型的信息丢失。在几个生物学数据集上的评估结果表明,netImpute 可以更有效地恢复 scRNA-seq 数据中的缺失转录本,并比现有方法更有效地识别和可视化异质细胞类型。