Hu Joyce, Peng Beverly, Pankajam Ajith V, Xu Bingfang, Deshpande Vikrant Anil, Bueckle Andreas, Herr Bruce W, Börner Katy, Dupont Christopher, Scheuermann Richard H, Zhang Yun
bioRxiv. 2025 Apr 16:2025.04.10.648034. doi: 10.1101/2025.04.10.648034.
The advancement of single cell technologies has driven significant progress in constructing a multiscale, pan-organ Human Reference Atlas (HRA) for healthy human cells, though challenges remain in harmonizing cell types and unifying nomenclature. Multiple machine learning and artificial intelligence methods, including pre-trained and fine-tuned models on large-scale atlas data, are publicly available for the single cell community users to computationally annotate and match their cell clusters to the reference atlas.
This study benchmarks four computational tools for cell type annotation and matching - Azimuth, CellTypist, scArches, and FR-Match - using two lung atlas datasets, the Human Lung Cell Atlas (HLCA) and the LungMAP single-cell reference (CellRef). Despite achieving high overall performance while comparing algorithmic cell type annotations to expert annotated data, variations in accuracy were observed, especially in annotating rare cell types, underlining the need for improved consistency across cell type prediction methods. The benchmarked methods were used to cross-compare and incrementally integrate 61 cell types from HLCA and 48 cell types from CellRef, resulting in a meta-atlas of 41 matched cell types, 20 HLCA-specific cell types, and 7 CellRef-specific cell types.
This study reveals complementing strengths of the benchmarked methods and presents a framework for incremental growth of the cell type inventory in the reference atlases, leading to 68 unique cell types in the meta-atlas across CellRef and HLCA. The benchmarking analysis contributes to improving the coverage and quality of HRA construction by assessing the reliability and performance of cell type annotation approaches for single cell transcriptomics datasets.
单细胞技术的进步推动了构建健康人类细胞的多尺度全器官人类参考图谱(HRA)取得重大进展,尽管在协调细胞类型和统一命名方面仍存在挑战。多种机器学习和人工智能方法,包括在大规模图谱数据上进行预训练和微调的模型,已向单细胞社区用户公开,以便通过计算对其细胞簇进行注释并与参考图谱进行匹配。
本研究使用两个人类肺图谱数据集,即人类肺细胞图谱(HLCA)和肺MAP单细胞参考图谱(CellRef),对四种用于细胞类型注释和匹配的计算工具——方位角(Azimuth)、细胞类型分类器(CellTypist)、单细胞架构搜索(scArches)和FR匹配(FR-Match)进行了基准测试。尽管在将算法细胞类型注释与专家注释数据进行比较时整体性能较高,但仍观察到准确性存在差异,尤其是在注释稀有细胞类型时,这突出表明需要提高细胞类型预测方法之间的一致性。使用基准测试方法对HLCA的61种细胞类型和CellRef的48种细胞类型进行交叉比较和逐步整合,得到了一个包含41种匹配细胞类型、20种HLCA特有的细胞类型和7种CellRef特有的细胞类型的元图谱。
本研究揭示了基准测试方法的互补优势,并提出了一个参考图谱中细胞类型清单增量增长的框架,从而在跨越CellRef和HLCA的元图谱中产生了68种独特的细胞类型。基准测试分析通过评估单细胞转录组学数据集的细胞类型注释方法的可靠性和性能,有助于提高HRA构建的覆盖范围和质量。