对单细胞ATAC测序数据的自动化细胞类型注释工具进行基准测试。

Benchmarking automated cell type annotation tools for single-cell ATAC-seq data.

作者信息

Wang Yuge, Sun Xingzhi, Zhao Hongyu

机构信息

Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States.

Department of Statistics and Data Science, Yale University, New Haven, CT, United States.

出版信息

Front Genet. 2022 Dec 13;13:1063233. doi: 10.3389/fgene.2022.1063233. eCollection 2022.

DOI:10.3389/fgene.2022.1063233

PMID:36583014

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9792779/

Abstract

As single-cell chromatin accessibility profiling methods advance, scATAC-seq has become ever more important in the study of candidate regulatory genomic regions and their roles underlying developmental, evolutionary, and disease processes. At the same time, cell type annotation is critical in understanding the cellular composition of complex tissues and identifying potential novel cell types. However, most existing methods that can perform automated cell type annotation are designed to transfer labels from an annotated scRNA-seq data set to another scRNA-seq data set, and it is not clear whether these methods are adaptable to annotate scATAC-seq data. Several methods have been recently proposed for label transfer from scRNA-seq data to scATAC-seq data, but there is a lack of benchmarking study on the performance of these methods. Here, we evaluated the performance of five scATAC-seq annotation methods on both their classification accuracy and scalability using publicly available single-cell datasets from mouse and human tissues including brain, lung, kidney, PBMC, and BMMC. Using the BMMC data as basis, we further investigated the performance of these methods across different data sizes, mislabeling rates, sequencing depths and the number of cell types unique to scATAC-seq. Bridge integration, which is the only method that requires additional multimodal data and does not need gene activity calculation, was overall the best method and robust to changes in data size, mislabeling rate and sequencing depth. Conos was the most time and memory efficient method but performed the worst in terms of prediction accuracy. scJoint tended to assign cells to similar cell types and performed relatively poorly for complex datasets with deep annotations but performed better for datasets only with major label annotations. The performance of scGCN and Seurat v3 was moderate, but scGCN was the most time-consuming method and had the most similar performance to random classifiers for cell types unique to scATAC-seq.

摘要

随着单细胞染色质可及性分析方法的不断进步，scATAC-seq在候选调控基因组区域及其在发育、进化和疾病过程中的作用研究中变得越来越重要。与此同时，细胞类型注释对于理解复杂组织的细胞组成和识别潜在的新型细胞类型至关重要。然而，大多数现有的能够进行自动细胞类型注释的方法是为了将标签从一个注释好的scRNA-seq数据集转移到另一个scRNA-seq数据集，目前尚不清楚这些方法是否适用于注释scATAC-seq数据。最近已经提出了几种从scRNA-seq数据到scATAC-seq数据进行标签转移的方法，但缺乏对这些方法性能的基准研究。在这里，我们使用来自小鼠和人类组织（包括脑、肺、肾、外周血单核细胞和骨髓肥大细胞）的公开可用单细胞数据集，评估了五种scATAC-seq注释方法在分类准确性和可扩展性方面的性能。以骨髓肥大细胞数据为基础，我们进一步研究了这些方法在不同数据大小、错误标记率、测序深度以及scATAC-seq特有的细胞类型数量方面的性能。Bridge integration是唯一一种需要额外多组学数据且不需要计算基因活性的方法，总体上是最好的方法，并且对数据大小、错误标记率和测序深度的变化具有鲁棒性。Conos是最节省时间和内存的方法，但在预测准确性方面表现最差。scJoint倾向于将细胞分配到相似的细胞类型，对于具有深度注释的复杂数据集表现相对较差，但对于仅具有主要标签注释的数据集表现较好。scGCN和Seurat v3的性能中等，但scGCN是最耗时的方法，并且对于scATAC-seq特有的细胞类型，其性能与随机分类器最为相似。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fec9/9792779/fa5242e144aa/fgene-13-1063233-g001.jpg

相似文献

Benchmarking automated cell type annotation tools for single-cell ATAC-seq data.

Front Genet. 2022 Dec 13;13:1063233. doi: 10.3389/fgene.2022.1063233. eCollection 2022.

Benchmarking Algorithms for Gene Set Scoring of Single-cell ATAC-seq Data.

Genomics Proteomics Bioinformatics. 2024 Jul 3;22(2). doi: 10.1093/gpbjnl/qzae014.

Evaluation of classification in single cell atac-seq data with machine learning methods.

BMC Bioinformatics. 2022 Sep 21;23(Suppl 5):249. doi: 10.1186/s12859-022-04774-z.

scATAnno: Automated Cell Type Annotation for single-cell ATAC Sequencing Data.

bioRxiv. 2024 Mar 25:2023.06.01.543296. doi: 10.1101/2023.06.01.543296.

scNCL: transferring labels from scRNA-seq to scATAC-seq data with neighborhood contrastive regularization.

Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad505.

AtacAnnoR: a reference-based annotation tool for single cell ATAC-seq data.

Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad268.

scATAcat: cell-type annotation for scATAC-seq data.

NAR Genom Bioinform. 2024 Oct 8;6(4):lqae135. doi: 10.1093/nargab/lqae135. eCollection 2024 Sep.

scJoint integrates atlas-scale single-cell RNA-seq and ATAC-seq data with transfer learning.

Nat Biotechnol. 2022 May;40(5):703-710. doi: 10.1038/s41587-021-01161-6. Epub 2022 Jan 20.

Assessment of computational methods for the analysis of single-cell ATAC-seq data.

Genome Biol. 2019 Nov 18;20(1):241. doi: 10.1186/s13059-019-1854-5.

A Unified Deep Learning Framework for Single-Cell ATAC-Seq Analysis Based on ProdDep Transformer Encoder.

Int J Mol Sci. 2023 Mar 1;24(5):4784. doi: 10.3390/ijms24054784.

引用本文的文献

Cell-Type Annotation for scATAC-Seq Data by Integrating Chromatin Accessibility and Genome Sequence.

Biomolecules. 2025 Jun 27;15(7):938. doi: 10.3390/biom15070938.

Hyperedge Representations with Hypergraph Wavelets: Applications to Spatial Transcriptomics.

ArXiv. 2024 Sep 14:arXiv:2409.09469v1.

Fast clustering and cell-type annotation of scATAC data using pre-trained embeddings.

NAR Genom Bioinform. 2024 Jul 5;6(3):lqae073. doi: 10.1093/nargab/lqae073. eCollection 2024 Sep.

HyGAnno: hybrid graph neural network-based cell type annotation for single-cell ATAC sequencing data.

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae152.

本文引用的文献

Chromatin accessibility profiling methods.

Nat Rev Methods Primers. 2021;1. doi: 10.1038/s43586-020-00008-9. Epub 2021 Jan 21.

Dictionary learning for integrative, multimodal and scalable single-cell analysis.

Nat Biotechnol. 2024 Feb;42(2):293-304. doi: 10.1038/s41587-023-01767-y. Epub 2023 May 25.

scJoint integrates atlas-scale single-cell RNA-seq and ATAC-seq data with transfer learning.

Nat Biotechnol. 2022 May;40(5):703-710. doi: 10.1038/s41587-021-01161-6. Epub 2022 Jan 20.

Single-cell chromatin state analysis with Signac.

Nat Methods. 2021 Nov;18(11):1333-1341. doi: 10.1038/s41592-021-01282-5. Epub 2021 Nov 1.

scGCN is a graph convolutional networks algorithm for knowledge transfer in single cell omics.

Nat Commun. 2021 Jun 22;12(1):3826. doi: 10.1038/s41467-021-24172-y.

Tutorial: guidelines for annotating single-cell transcriptomic maps using automated and manual methods.

Nat Protoc. 2021 Jun;16(6):2749-2764. doi: 10.1038/s41596-021-00534-0. Epub 2021 May 24.

Single cell regulatory landscape of the mouse kidney highlights cellular differentiation programs and disease targets.

Nat Commun. 2021 Apr 15;12(1):2277. doi: 10.1038/s41467-021-22266-1.

Automated methods for cell type annotation on scRNA-seq data.

Comput Struct Biotechnol J. 2021 Jan 19;19:961-969. doi: 10.1016/j.csbj.2021.01.015. eCollection 2021.

The epigenetic basis of cellular heterogeneity.

Nat Rev Genet. 2021 Apr;22(4):235-250. doi: 10.1038/s41576-020-00300-0. Epub 2020 Nov 26.

Chromatin Potential Identified by Shared Single-Cell Profiling of RNA and Chromatin.

Cell. 2020 Nov 12;183(4):1103-1116.e20. doi: 10.1016/j.cell.2020.09.056. Epub 2020 Oct 23.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

对单细胞ATAC测序数据的自动化细胞类型注释工具进行基准测试。

Benchmarking automated cell type annotation tools for single-cell ATAC-seq data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献