Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48105, USA.
Genomics Proteomics Bioinformatics. 2021 Apr;19(2):267-281. doi: 10.1016/j.gpb.2020.07.004. Epub 2020 Dec 24.
Annotating cell types is a critical step in single-cell RNA sequencing (scRNA-seq) data analysis. Some supervised or semi-supervised classification methods have recently emerged to enable automated cell type identification. However, comprehensive evaluations of these methods are lacking. Moreover, it is not clear whether some classification methods originally designed for analyzing other bulk omics data are adaptable to scRNA-seq analysis. In this study, we evaluated ten cell type annotation methods publicly available as R packages. Eight of them are popular methods developed specifically for single-cell research, including Seurat, scmap, SingleR, CHETAH, SingleCellNet, scID, Garnett, and SCINA. The other two methods were repurposed from deconvoluting DNA methylation data, i.e., linear constrained projection (CP) and robust partial correlations (RPC). We conducted systematic comparisons on a wide variety of public scRNA-seq datasets as well as simulation data. We assessed the accuracy through intra-dataset and inter-dataset predictions; the robustness over practical challenges such as gene filtering, high similarity among cell types, and increased cell type classes; as well as the detection of rare and unknown cell types. Overall, methods such as Seurat, SingleR, CP, RPC, and SingleCellNet performed well, with Seurat being the best at annotating major cell types. Additionally, Seurat, SingleR, CP, and RPC were more robust against downsampling. However, Seurat did have a major drawback at predicting rare cell populations, and it was suboptimal at differentiating cell types highly similar to each other, compared to SingleR and RPC. All the code and data are available from https://github.com/qianhuiSenn/scRNA_cell_deconv_benchmark.
注释细胞类型是单细胞 RNA 测序 (scRNA-seq) 数据分析中的关键步骤。最近出现了一些有监督或半监督分类方法,可实现自动细胞类型识别。然而,这些方法缺乏全面的评估。此外,尚不清楚最初为分析其他批量组学数据而设计的某些分类方法是否适用于 scRNA-seq 分析。在这项研究中,我们评估了十个可作为 R 包公开获得的细胞类型注释方法。其中有八个是专门为单细胞研究开发的流行方法,包括 Seurat、scmap、SingleR、CHETAH、SingleCellNet、scID、Garnett 和 SCINA。另外两个方法是从去卷积 DNA 甲基化数据中重新利用的,即线性约束投影 (CP) 和稳健部分相关 (RPC)。我们在广泛的公共 scRNA-seq 数据集以及模拟数据上进行了系统比较。我们通过在数据集内和跨数据集的预测来评估准确性;在基因过滤、细胞类型之间的高度相似性以及增加的细胞类型类等实际挑战中的稳健性;以及稀有和未知细胞类型的检测。总体而言,Seurat、SingleR、CP、RPC 和 SingleCellNet 等方法表现良好,其中 Seurat 在注释主要细胞类型方面表现最佳。此外,Seurat、SingleR、CP 和 RPC 对下采样更具鲁棒性。然而,与 SingleR 和 RPC 相比,Seurat 在预测稀有细胞群体方面存在主要缺陷,并且在区分彼此高度相似的细胞类型方面表现不佳。所有代码和数据都可从 https://github.com/qianhuiSenn/scRNA_cell_deconv_benchmark 获得。