Huang Yu-An, Li Yue-Chao, You Zhu-Hong, Hu Lun, Hu Peng-Wei, Wang Lei, Peng Yuzhong, Huang Zhi-An
School of Computer Science, Northwestern Polytechnical University, Xi'an, 710000, China.
Research & Development Institute of Northwestern Polytechnical University in Shenzhen, Shenzhen, 518063, China.
BMC Biol. 2025 Jan 23;23(1):23. doi: 10.1186/s12915-025-02128-8.
Recent advancements in single-cell RNA sequencing have greatly expanded our knowledge of the heterogeneous nature of tissues. However, robust and accurate cell type annotation continues to be a major challenge, hindered by issues such as marker specificity, batch effects, and a lack of comprehensive spatial and interaction data. Traditional annotation methods often fail to adequately address the complexity of cellular interactions and gene regulatory networks.
We proposed scMCGraph, a comprehensive computational framework that integrates gene expression with pathway activity to accurately annotate cell types within diverse scRNA-seq datasets. Initially, our model constructs multiple pathway-specific views using various pathway databases, which reflect both gene expression and pathway activities. These pathway-specific views are then integrated into a consensus graph. The consensus graph is subsequently utilized to reconstruct the multiple pathway views. Our model demonstrated exceptional robustness and accuracy across various analyses, including cross-platform, cross-time, cross-sample, and clinical dataset evaluations.
scMCGraph represents a significant advance in cell type annotation. The experiments have demonstrated that introducing pathway information significantly improves the learning of cell-cell graphs, with their resulting consensus graph enhancing the predictive performance of cell type prediction. Different pathway databases provide complementary data, and an increase in the number of pathways can also boost model performance. Extensive testing shows that in various cross-dataset application scenarios, scMCGraph consistently exhibits both accuracy and robustness.
单细胞RNA测序的最新进展极大地扩展了我们对组织异质性的认识。然而,由于标记特异性、批次效应以及缺乏全面的空间和相互作用数据等问题,稳健而准确的细胞类型注释仍然是一项重大挑战。传统的注释方法往往无法充分应对细胞相互作用和基因调控网络的复杂性。
我们提出了scMCGraph,这是一个综合计算框架,它将基因表达与通路活性相结合,以准确注释不同scRNA-seq数据集中的细胞类型。最初,我们的模型使用各种通路数据库构建多个特定于通路的视图,这些视图反映了基因表达和通路活性。然后将这些特定于通路的视图整合到一个共识图中。随后利用共识图来重建多个通路视图。我们的模型在包括跨平台、跨时间、跨样本和临床数据集评估在内的各种分析中都表现出了卓越的稳健性和准确性。
scMCGraph代表了细胞类型注释方面的一项重大进展。实验表明,引入通路信息显著改善了细胞-细胞图的学习,其生成的共识图提高了细胞类型预测的性能。不同的通路数据库提供了互补数据,增加通路数量也可以提高模型性能。广泛的测试表明,在各种跨数据集应用场景中,scMCGraph始终表现出准确性和稳健性。