Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
Nat Commun. 2023 Jan 14;14(1):223. doi: 10.1038/s41467-023-35923-4.
Consistent annotation transfer from reference dataset to query dataset is fundamental to the development and reproducibility of single-cell research. Compared with traditional annotation methods, deep learning based methods are faster and more automated. A series of useful single cell analysis tools based on autoencoder architecture have been developed but these struggle to strike a balance between depth and interpretability. Here, we present TOSICA, a multi-head self-attention deep learning model based on Transformer that enables interpretable cell type annotation using biologically understandable entities, such as pathways or regulons. We show that TOSICA achieves fast and accurate one-stop annotation and batch-insensitive integration while providing biologically interpretable insights for understanding cellular behavior during development and disease progressions. We demonstrate TOSICA's advantages by applying it to scRNA-seq data of tumor-infiltrating immune cells, and CD14+ monocytes in COVID-19 to reveal rare cell types, heterogeneity and dynamic trajectories associated with disease progression and severity.
从参考数据集到查询数据集的一致注释转移对于单细胞研究的发展和可重复性至关重要。与传统的注释方法相比,基于深度学习的方法更快、更自动化。已经开发了一系列基于自动编码器架构的有用的单细胞分析工具,但这些工具在深度和可解释性之间难以平衡。在这里,我们提出了 TOSICA,这是一种基于 Transformer 的多头自注意力深度学习模型,它可以使用生物上可理解的实体(如途径或调控子)进行可解释的细胞类型注释。我们表明,TOSICA 实现了快速准确的一站式注释和批次不敏感的集成,同时为理解发育和疾病进展过程中的细胞行为提供了生物上可解释的见解。我们通过将 TOSICA 应用于肿瘤浸润免疫细胞的 scRNA-seq 数据和 COVID-19 中的 CD14+单核细胞,展示了 TOSICA 的优势,揭示了与疾病进展和严重程度相关的罕见细胞类型、异质性和动态轨迹。