State Key Laboratory of Digital Medical Engineering, School of Biological Science & Medical Engineering, Southeast University, Nanjing, 210096, China.
Department of General Surgery, Nanjing Drum Tower Hospital, the Affiliated Hospital of Nanjing University Medical School, Nanjing, 210008, PR China.
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad179.
Undoubtedly, single-cell RNA sequencing (scRNA-seq) has changed the research landscape by providing insights into heterogeneous, complex and rare cell populations. Given that more such data sets will become available in the near future, their accurate assessment with compatible and robust models for cell type annotation is a prerequisite. Considering this, herein, we developed scAnno (scRNA-seq data annotation), an automated annotation tool for scRNA-seq data sets primarily based on the single-cell cluster levels, using a joint deconvolution strategy and logistic regression. We explicitly constructed a reference profile for human (30 cell types and 50 human tissues) and a reference profile for mouse (26 cell types and 50 mouse tissues) to support this novel methodology (scAnno). scAnno offers a possibility to obtain genes with high expression and specificity in a given cell type as cell type-specific genes (marker genes) by combining co-expression genes with seed genes as a core. Of importance, scAnno can accurately identify cell type-specific genes based on cell type reference expression profiles without any prior information. Particularly, in the peripheral blood mononuclear cell data set, the marker genes identified by scAnno showed cell type-specific expression, and the majority of marker genes matched exactly with those included in the CellMarker database. Besides validating the flexibility and interpretability of scAnno in identifying marker genes, we also proved its superiority in cell type annotation over other cell type annotation tools (SingleR, scPred, CHETAH and scmap-cluster) through internal validation of data sets (average annotation accuracy: 99.05%) and cross-platform data sets (average annotation accuracy: 95.56%). Taken together, we established the first novel methodology that utilizes a deconvolution strategy for automated cell typing and is capable of being a significant application in broader scRNA-seq analysis. scAnno is available at https://github.com/liuhong-jia/scAnno.
毫无疑问,单细胞 RNA 测序(scRNA-seq)通过提供对异质、复杂和罕见细胞群体的深入了解,改变了研究格局。鉴于不久的将来会有更多这样的数据集可用,使用兼容且稳健的细胞类型注释模型对其进行准确评估是先决条件。考虑到这一点,在此,我们开发了 scAnno(单细胞 RNA-seq 数据注释),这是一种主要基于单细胞聚类水平的 scRNA-seq 数据集的自动注释工具,使用联合去卷积策略和逻辑回归。我们明确构建了人类(30 种细胞类型和 50 个人组织)和小鼠(26 种细胞类型和 50 种小鼠组织)的参考图谱,以支持这种新方法(scAnno)。scAnno 通过将共表达基因与种子基因相结合作为核心,提供了一种在给定细胞类型中获得高表达和特异性基因的可能性,即细胞类型特异性基因(标记基因)。重要的是,scAnno 可以在没有任何先验信息的情况下,基于细胞类型参考表达图谱准确识别细胞类型特异性基因。特别是在外周血单核细胞数据集上,scAnno 鉴定的标记基因表现出细胞类型特异性表达,并且大多数标记基因与 CellMarker 数据库中包含的标记基因完全匹配。除了验证 scAnno 在识别标记基因方面的灵活性和可解释性外,我们还通过数据集的内部验证(平均注释准确性:99.05%)和跨平台数据集(平均注释准确性:95.56%)证明了其在细胞类型注释方面优于其他细胞类型注释工具(SingleR、scPred、CHETAH 和 scmap-cluster)的优越性。总之,我们建立了第一个利用去卷积策略进行自动细胞分型的新方法,并且能够成为更广泛的 scRNA-seq 分析的重要应用。scAnno 可在 https://github.com/liuhong-jia/scAnno 上获得。