Peter Munk Cardiac Centre, University Health Network, Toronto, Canada.
Department of Statistical Sciences, University of Toronto, Toronto, Canada.
Genome Biol. 2021 Mar 4;22(1):74. doi: 10.1186/s13059-021-02270-w.
Single-cell assay for transposase-accessible chromatin sequencing (scATAC-seq) identifies regulated chromatin accessibility modules at the single-cell resolution. Robust evaluation is critical to the development of scATAC-seq pipelines, which calls for reproducible datasets for benchmarking. We hereby present the simATAC framework, an R package that generates scATAC-seq count matrices that highly resemble real scATAC-seq datasets in library size, sparsity, and chromatin accessibility signals. simATAC deploys statistical models derived from analyzing 90 real scATAC-seq cell groups. simATAC provides a robust and systematic approach to generate in silico scATAC-seq samples with known cell labels for assessing analytical pipelines.
单细胞染色质可及性测序 (scATAC-seq) 的单细胞分析鉴定了受调控的染色质可及性模块。稳健的评估对于 scATAC-seq 管道的开发至关重要,这需要可重复的数据集进行基准测试。我们在此提出了 simATAC 框架,这是一个 R 包,可以生成高度类似于真实 scATAC-seq 数据集的 scATAC-seq 计数矩阵,在文库大小、稀疏性和染色质可及性信号方面都很相似。simATAC 利用从 90 个真实 scATAC-seq 细胞群中分析得出的统计模型进行部署。simATAC 为评估分析管道提供了一种生成具有已知细胞标签的虚拟 scATAC-seq 样本的稳健和系统的方法。