Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Biosciences Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK.
Cell. 2023 Dec 21;186(26):5876-5891.e20. doi: 10.1016/j.cell.2023.11.026.
Harmonizing cell types across the single-cell community and assembling them into a common framework is central to building a standardized Human Cell Atlas. Here, we present CellHint, a predictive clustering tree-based tool to resolve cell-type differences in annotation resolution and technical biases across datasets. CellHint accurately quantifies cell-cell transcriptomic similarities and places cell types into a relationship graph that hierarchically defines shared and unique cell subtypes. Application to multiple immune datasets recapitulates expert-curated annotations. CellHint also reveals underexplored relationships between healthy and diseased lung cell states in eight diseases. Furthermore, we present a workflow for fast cross-dataset integration guided by harmonized cell types and cell hierarchy, which uncovers underappreciated cell types in adult human hippocampus. Finally, we apply CellHint to 12 tissues from 38 datasets, providing a deeply curated cross-tissue database with ∼3.7 million cells and various machine learning models for automatic cell annotation across human tissues.
协调单细胞领域中的细胞类型,并将它们整合到一个通用框架中,是构建标准化人类细胞图谱的核心。在这里,我们提出了 CellHint,这是一种基于预测聚类树的工具,用于解决不同数据集在注释分辨率和技术偏差方面的细胞类型差异。CellHint 能够准确地量化细胞间转录组的相似性,并将细胞类型放入一个关系图中,该图按层次结构定义了共享和独特的细胞亚型。在多个免疫数据集上的应用重现了专家精心策划的注释。CellHint 还揭示了八种疾病中健康和患病肺细胞状态之间未被充分探索的关系。此外,我们提出了一个基于协调的细胞类型和细胞层次结构的快速跨数据集集成工作流程,该流程揭示了成人海马体中未被充分重视的细胞类型。最后,我们将 CellHint 应用于 38 个数据集的 12 种组织,提供了一个经过深度编辑的跨组织数据库,其中包含约 370 万个细胞和各种用于跨人类组织自动细胞注释的机器学习模型。