Qingdao University of Science and Technology, China.
Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, Thuwal, Saudi Arabia.
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad081.
Single-cell omics data are growing at an unprecedented rate, whereas effective integration of them remains challenging due to different sequencing methods, quality, and expression pattern of each omics data. In this study, we propose a universal framework for the integration of single-cell multi-omics data based on graph convolutional network (GCN-SC). Among the multiple single-cell data, GCN-SC usually selects one data with the largest number of cells as the reference and the rest as the query dataset. It utilizes mutual nearest neighbor algorithm to identify cell-pairs, which provide connections between cells both within and across the reference and query datasets. A GCN algorithm further takes the mixed graph constructed from these cell-pairs to adjust count matrices from the query datasets. Finally, dimension reduction is performed by using non-negative matrix factorization before visualization. By applying GCN-SC on six datasets, we show that GCN-SC can effectively integrate sequencing data from multiple single-cell sequencing technologies, species or different omics, which outperforms the state-of-the-art methods, including Seurat, LIGER, GLUER and Pamona.
单细胞组学数据正在以前所未有的速度增长,然而由于每种组学数据的测序方法、质量和表达模式不同,有效地整合它们仍然具有挑战性。在这项研究中,我们提出了一种基于图卷积网络(GCN-SC)的单细胞多组学数据集成的通用框架。在多种单细胞数据中,GCN-SC 通常选择细胞数量最多的一个数据作为参考,其余的作为查询数据集。它利用互最近邻算法识别细胞对,这些细胞对提供了参考和查询数据集中细胞之间的连接。然后,GCN 算法进一步利用从这些细胞对构建的混合图来调整查询数据集中的计数矩阵。最后,在可视化之前通过非负矩阵分解进行降维。通过在六个数据集上应用 GCN-SC,我们表明 GCN-SC 可以有效地整合来自多种单细胞测序技术、物种或不同组学的测序数据,其性能优于现有的方法,包括 Seurat、LIGER、GLUER 和 Pamona。