Department of Computer Science, Emory University, 400 Dowman Drive, Atlanta, GA, 30322, USA.
Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, 518055, P. R. China.
Nat Commun. 2023 Apr 3;14(1):1864. doi: 10.1038/s41467-023-37439-3.
Computational cell type identification is a fundamental step in single-cell omics data analysis. Supervised celltyping methods have gained increasing popularity in single-cell RNA-seq data because of the superior performance and the availability of high-quality reference datasets. Recent technological advances in profiling chromatin accessibility at single-cell resolution (scATAC-seq) have brought new insights to the understanding of epigenetic heterogeneity. With continuous accumulation of scATAC-seq datasets, supervised celltyping method specifically designed for scATAC-seq is in urgent need. Here we develop Cellcano, a computational method based on a two-round supervised learning algorithm to identify cell types from scATAC-seq data. The method alleviates the distributional shift between reference and target data and improves the prediction performance. After systematically benchmarking Cellcano on 50 well-designed celltyping tasks from various datasets, we show that Cellcano is accurate, robust, and computationally efficient. Cellcano is well-documented and freely available at https://marvinquiet.github.io/Cellcano/ .
计算细胞类型鉴定是单细胞组学数据分析中的一个基本步骤。有监督的细胞类型鉴定方法在单细胞 RNA-seq 数据中越来越受欢迎,因为它们具有优越的性能和高质量的参考数据集。最近在单细胞分辨率下对染色质可及性进行分析的技术进展(scATAC-seq)为理解表观遗传异质性带来了新的见解。随着 scATAC-seq 数据集的不断积累,专门针对 scATAC-seq 的有监督的细胞类型鉴定方法迫在眉睫。在这里,我们开发了一种基于两轮监督学习算法的计算方法 Cellcano,用于从 scATAC-seq 数据中识别细胞类型。该方法缓解了参考数据和目标数据之间的分布偏移,并提高了预测性能。在对来自不同数据集的 50 个精心设计的细胞类型鉴定任务进行系统基准测试后,我们表明 Cellcano 准确、稳健且计算效率高。Cellcano 有详细的文档记录,并可在 https://marvinquiet.github.io/Cellcano/ 上免费获取。