Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Science and Technology, Tongji University, Shanghai 200092, China; Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100084, China; Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100084, China.
Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Science and Technology, Tongji University, Shanghai 200092, China; Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China.
Cell Rep Methods. 2023 Sep 25;3(9):100577. doi: 10.1016/j.crmeth.2023.100577. Epub 2023 Aug 31.
The rapid accumulation of single-cell RNA-seq data has provided rich resources to characterize various human cell populations. However, achieving accurate cell-type annotation using public references presents challenges due to inconsistent annotations, batch effects, and rare cell types. Here, we introduce SELINA (single-cell identity navigator), an integrative and automatic cell-type annotation framework based on a pre-curated reference atlas spanning various tissues. SELINA employs a multiple-adversarial domain adaptation network to remove batch effects within the reference dataset. Additionally, it enhances the annotation of less frequent cell types by synthetic minority oversampling and fits query data with the reference data using an autoencoder. SELINA culminates in the creation of a comprehensive and uniform reference atlas, encompassing 1.7 million cells covering 230 distinct human cell types. We substantiate its robustness and superiority across a multitude of human tissues. Notably, SELINA could accurately annotate cells within diverse disease contexts. SELINA provides a complete solution for human single-cell RNA-seq data annotation with both python and R packages.
单细胞 RNA-seq 数据的快速积累为描述各种人类细胞群体提供了丰富的资源。然而,由于注释不一致、批次效应和稀有细胞类型,使用公共参考实现准确的细胞类型注释具有挑战性。在这里,我们介绍了 SELINA(单细胞身份导航器),这是一个基于涵盖各种组织的预编制参考图谱的集成和自动细胞类型注释框架。SELINA 采用了多个对抗性的域自适应网络来消除参考数据集中的批次效应。此外,它通过合成少数过采样增强了较不常见细胞类型的注释,并使用自动编码器将查询数据与参考数据拟合。SELINA 的最终结果是创建了一个全面和统一的参考图谱,包含 170 万个细胞,涵盖 230 种不同的人类细胞类型。我们在许多人类组织中证明了它的稳健性和优越性。值得注意的是,SELINA 可以准确注释不同疾病背景下的细胞。SELINA 为人类单细胞 RNA-seq 数据注释提供了一个完整的解决方案,包括 python 和 R 包。