Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America.
Section of Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, Connecticut, United States of America.
PLoS Comput Biol. 2021 May 18;17(5):e1009029. doi: 10.1371/journal.pcbi.1009029. eCollection 2021 May.
Single-cell RNA sequencing technology provides an opportunity to study gene expression at single-cell resolution. However, prevalent dropout events result in high data sparsity and noise that may obscure downstream analyses in single-cell transcriptomic studies. We propose a new method, G2S3, that imputes dropouts by borrowing information from adjacent genes in a sparse gene graph learned from gene expression profiles across cells. We applied G2S3 and ten existing imputation methods to eight single-cell transcriptomic datasets and compared their performance. Our results demonstrated that G2S3 has superior overall performance in recovering gene expression, identifying cell subtypes, reconstructing cell trajectories, identifying differentially expressed genes, and recovering gene regulatory and correlation relationships. Moreover, G2S3 is computationally efficient for imputation in large-scale single-cell transcriptomic datasets.
单细胞 RNA 测序技术为研究单细胞水平的基因表达提供了机会。然而,普遍存在的缺失事件导致了高数据稀疏性和噪声,这可能会掩盖单细胞转录组研究中的下游分析。我们提出了一种新的方法 G2S3,它通过从跨细胞基因表达谱中学习的稀疏基因图中相邻基因借用信息来填补缺失值。我们将 G2S3 与十种现有的填补方法应用于八个单细胞转录组数据集,并比较了它们的性能。我们的结果表明,G2S3 在恢复基因表达、识别细胞亚型、重建细胞轨迹、识别差异表达基因以及恢复基因调控和相关性方面具有优越的整体性能。此外,G2S3 在大规模单细胞转录组数据集的填补中具有计算效率。