Kumar Swain Asish, Singh Shekhawat Rajveer, Yadav Pankaj
Department of Bioscience and Bioengineering, Indian Institute of Technology (IIT), N.H. 62, Nagaur Road, Karwar, Jodhpur 342030, Rajasthan, India.
School of Artificial Intelligence and Data Science, Indian Institute of Technology (IIT), N.H. 62, Nagaur Road, Karwar, Jodhpur 342030, Rajasthan, India.
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf253.
Cell-type annotation remains a major challenge in single-cell and spatial omics analysis. Most existing methods rely on single-cell RNA sequencing (scRNA-seq) references or predefined marker sets. However, the scarcity of high-quality scRNA-seq references and marker sets makes relying on a single approach prone to bias and limits usability. Furthermore, available methods for cell-type annotation in single-cell ATAC-sequencing (scATAC-seq) and spatial transcriptomics datasets perform poorly. Here, we present ScInfeR, a graph-based cell-type annotation method that combines information from both scRNA-seq references and marker sets. By integrating these two data sources, ScInfeR can accurately annotate broad range of cell-types. It employs a hierarchical framework inspired by message-passing layers in graph neural networks to accurately identify cell subtypes. ScInfeR is highly versatile, supporting cell annotation across scRNA-seq, scATAC-seq, and spatial omics datasets. For scATAC-seq, it effectively utilizes chromatin accessibility data, while for spatial transcriptomics, it incorporates spatial coordinate information. Additionally, ScInfeR supports weighted positive and negative markers, allowing users to define marker importance in cell-type classification. Our extensive benchmarking across multiple atlas-scale scRNA-seq, scATAC-seq, and spatial datasets, evaluating 10 existing tools in over 100 cell-type prediction tasks, demonstrated ScInfeR's superior performance. Noteworthy, it exhibits robustness against batch effects arising in these datasets. To facilitate seamless annotation, we developed ScInfeRDB, an interactive database containing manually curated scRNA-seq references and marker sets for 329 cell-types, covering 2497 gene markers in 28 tissue types from human and plant. ScInfeR is available as an R package, with both the tool and database publicly accessible at https://www.swainasish.in/scinfer.
细胞类型注释仍然是单细胞和空间组学分析中的一项重大挑战。大多数现有方法依赖于单细胞RNA测序(scRNA-seq)参考或预定义的标记集。然而,高质量scRNA-seq参考和标记集的稀缺使得依赖单一方法容易产生偏差并限制了可用性。此外,单细胞ATAC测序(scATAC-seq)和空间转录组学数据集中可用的细胞类型注释方法表现不佳。在这里,我们提出了ScInfeR,一种基于图的细胞类型注释方法,它结合了来自scRNA-seq参考和标记集的信息。通过整合这两个数据源,ScInfeR可以准确注释广泛的细胞类型。它采用了一个受图神经网络中消息传递层启发的分层框架来准确识别细胞亚型。ScInfeR具有高度的通用性,支持跨scRNA-seq、scATAC-seq和空间组学数据集的细胞注释。对于scATAC-seq,它有效利用染色质可及性数据,而对于空间转录组学,它纳入空间坐标信息。此外,ScInfeR支持加权正、负标记,允许用户在细胞类型分类中定义标记的重要性。我们在多个图谱规模的scRNA-seq、scATAC-seq和空间数据集上进行了广泛的基准测试,在100多个细胞类型预测任务中评估了10种现有工具,证明了ScInfeR的卓越性能。值得注意的是,它对这些数据集中出现的批次效应具有鲁棒性。为了便于无缝注释,我们开发了ScInfeRDB,这是一个交互式数据库,包含针对329种细胞类型的人工策划的scRNA-seq参考和标记集,涵盖来自人类和植物28种组织类型的2497个基因标记。ScInfeR作为一个R包可用,该工具和数据库均可在https://www.swainasish.in/scinfer上公开获取。