基于实验标记免疫细胞亚型数据集的 scRNA-seq 注释方法比较。

A comparison of scRNA-seq annotation methods based on experimentally labeled immune cell subtype dataset.

机构信息

Institutes of Biomedical Sciences, Fudan University, 200032 Shanghai, P.R. China.

Intelligent Medicine Institute, Fudan University, 200032 Shanghai, P.R. China.

出版信息

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae392.

DOI:10.1093/bib/bbae392

PMID:39120646

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11312369/

Abstract

Cell-type annotation is a critical step in single-cell data analysis. With the development of numerous cell annotation methods, it is necessary to evaluate these methods to help researchers use them effectively. Reference datasets are essential for evaluation, but currently, the cell labels of reference datasets mainly come from computational methods, which may have computational biases and may not reflect the actual cell-type outcomes. This study first constructed an experimentally labeled immune cell-subtype single-cell dataset of the same batch and systematically evaluated 18 cell annotation methods. We assessed those methods under five scenarios, including intra-dataset validation, immune cell-subtype validation, unsupervised clustering, inter-dataset annotation, and unknown cell-type prediction. Accuracy and ARI were evaluation metrics. The results showed that SVM, scBERT, and scDeepSort were the best-performing supervised methods. Seurat was the best-performing unsupervised clustering method, but it couldn't fully fit the actual cell-type distribution. Our results indicated that experimentally labeled immune cell-subtype datasets revealed the deficiencies of unsupervised clustering methods and provided new dataset support for supervised methods.

摘要

细胞类型注释是单细胞数据分析的关键步骤。随着众多细胞注释方法的发展，有必要对这些方法进行评估，以帮助研究人员有效地使用它们。参考数据集对于评估至关重要，但目前，参考数据集的细胞标签主要来自于计算方法，这些方法可能存在计算偏差，并且可能无法反映实际的细胞类型结果。本研究首先构建了一个具有相同批次的实验标记免疫细胞亚群单细胞数据集，并系统地评估了 18 种细胞注释方法。我们在五种情况下评估了这些方法，包括数据集内验证、免疫细胞亚群验证、无监督聚类、数据集间注释和未知细胞类型预测。准确性和 ARI 是评估指标。结果表明，SVM、scBERT 和 scDeepSort 是表现最好的监督方法。Seurat 是表现最好的无监督聚类方法，但它不能完全拟合实际的细胞类型分布。我们的结果表明，实验标记的免疫细胞亚群数据集揭示了无监督聚类方法的不足，并为监督方法提供了新的数据集支持。