Department of Automation, Xiamen University, Xiamen, 361005, China.
Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen, China.
BMC Genomics. 2019 May 8;20(1):347. doi: 10.1186/s12864-019-5747-5.
Single-cell RNA-sequencing (scRNA-seq) is fast becoming a powerful tool for profiling genome-scale transcriptomes of individual cells and capturing transcriptome-wide cell-to-cell variability. However, scRNA-seq technologies suffer from high levels of technical noise and variability, hindering reliable quantification of lowly and moderately expressed genes. Since most downstream analyses on scRNA-seq, such as cell type clustering and differential expression analysis, rely on the gene-cell expression matrix, preprocessing of scRNA-seq data is a critical preliminary step in the analysis of scRNA-seq data.
We presented scNPF, an integrative scRNA-seq preprocessing framework assisted by network propagation and network fusion, for recovering gene expression loss, correcting gene expression measurements, and learning similarities between cells. scNPF leverages the context-specific topology inherent in the given data and the priori knowledge derived from publicly available molecular gene-gene interaction networks to augment gene-gene relationships in a data driven manner. We have demonstrated the great potential of scNPF in scRNA-seq preprocessing for accurately recovering gene expression values and learning cell similarity networks. Comprehensive evaluation of scNPF across a wide spectrum of scRNA-seq data sets showed that scNPF achieved comparable or higher performance than the competing approaches according to various metrics of internal validation and clustering accuracy. We have made scNPF an easy-to-use R package, which can be used as a versatile preprocessing plug-in for most existing scRNA-seq analysis pipelines or tools.
scNPF is a universal tool for preprocessing of scRNA-seq data, which jointly incorporates the global topology of priori interaction networks and the context-specific information encapsulated in the scRNA-seq data to capture both shared and complementary knowledge from diverse data sources. scNPF could be used to recover gene signatures and learn cell-to-cell similarities from emerging scRNA-seq data to facilitate downstream analyses such as dimension reduction, cell type clustering, and visualization.
单细胞 RNA 测序(scRNA-seq)正在迅速成为一种强大的工具,可用于对单个细胞的全基因组转录组进行分析,并捕获转录组范围内的细胞间变异性。然而,scRNA-seq 技术存在高水平的技术噪声和变异性,这阻碍了对低表达和中度表达基因的可靠定量。由于 scRNA-seq 的大多数下游分析,如细胞类型聚类和差异表达分析,都依赖于基因-细胞表达矩阵,因此 scRNA-seq 数据的预处理是 scRNA-seq 数据分析的关键初步步骤。
我们提出了 scNPF,这是一个由网络传播和网络融合辅助的集成 scRNA-seq 预处理框架,用于恢复基因表达损失、校正基因表达测量值以及学习细胞之间的相似性。scNPF 利用给定数据中固有的特定于上下文的拓扑结构和从公开可用的分子基因-基因相互作用网络中获得的先验知识,以数据驱动的方式增强基因-基因关系。我们已经证明了 scNPF 在 scRNA-seq 预处理中的巨大潜力,可用于准确恢复基因表达值和学习细胞相似性网络。通过对广泛的 scRNA-seq 数据集进行全面评估,结果表明,根据各种内部验证和聚类准确性指标,scNPF 的性能与竞争方法相当或更高。我们已经将 scNPF 制作成一个易于使用的 R 包,可以作为大多数现有 scRNA-seq 分析管道或工具的多功能预处理插件。
scNPF 是 scRNA-seq 数据预处理的通用工具,它联合了先验相互作用网络的全局拓扑结构和 scRNA-seq 数据中封装的特定于上下文的信息,从不同的数据源中捕获共享和互补的知识。scNPF 可用于从新兴的 scRNA-seq 数据中恢复基因特征并学习细胞间相似性,以促进下游分析,如降维、细胞类型聚类和可视化。