Jiang Yijia, Hu Zhirui, Lynch Allen W, Jiang Junchen, Zhu Alexander, Zeng Ziqi, Zhang Yi, Wu Gongwei, Xie Yingtian, Li Rong, Zhou Ningxuan, Meyer Cliff, Cejas Paloma, Brown Myles, Long Henry W, Qiu Xintao
bioRxiv. 2024 Mar 25:2023.06.01.543296. doi: 10.1101/2023.06.01.543296.
Recent advances in single-cell epigenomic techniques have created a growing demand for scATAC-seq analysis. One key analysis task is to determine cell type identity based on the epigenetic data. We introduce scATAnno, a python package designed to automatically annotate scATAC-seq data using large-scale scATAC-seq reference atlases. This workflow generates the reference atlases from publicly available datasets enabling accurate cell type annotation by integrating query data with reference atlases, without the use of scRNA-seq data. To enhance annotation accuracy, we have incorporated KNN-based and weighted distance-based uncertainty scores to effectively detect cell populations within the query data that are distinct from all cell types in the reference data. We compare and benchmark scATAnno against 7 other published approaches for cell annotation and show superior performance in multiple data sets and metrics. We showcase the utility of scATAnno across multiple datasets, including peripheral blood mononuclear cell (PBMC), Triple Negative Breast Cancer (TNBC), and basal cell carcinoma (BCC), and demonstrate that scATAnno accurately annotates cell types across conditions. Overall, scATAnno is a useful tool for scATAC-seq reference building and cell type annotation in scATAC-seq data and can aid in the interpretation of new scATAC-seq datasets in complex biological systems.
单细胞表观基因组技术的最新进展使得对scATAC-seq分析的需求不断增加。一项关键的分析任务是根据表观遗传数据确定细胞类型。我们介绍了scATAnno,这是一个用Python编写的程序包,旨在使用大规模scATAC-seq参考图谱自动注释scATAC-seq数据。此工作流程从公开可用的数据集中生成参考图谱,通过将查询数据与参考图谱整合,无需使用scRNA-seq数据即可实现准确的细胞类型注释。为了提高注释准确性,我们纳入了基于KNN和加权距离的不确定性分数,以有效检测查询数据中与参考数据中所有细胞类型都不同的细胞群体。我们将scATAnno与其他7种已发表的细胞注释方法进行了比较和基准测试,并在多个数据集和指标上展示了卓越的性能。我们展示了scATAnno在多个数据集(包括外周血单核细胞(PBMC)、三阴性乳腺癌(TNBC)和基底细胞癌(BCC))中的实用性,并证明scATAnno能够在不同条件下准确注释细胞类型。总体而言,scATAnno是用于scATAC-seq参考构建和scATAC-seq数据中细胞类型注释的有用工具,有助于解释复杂生物系统中的新scATAC-seq数据集。