NVIDIA Corporation, Santa Clara, CA, USA.
Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA.
Nat Commun. 2021 Mar 8;12(1):1507. doi: 10.1038/s41467-021-21765-5.
ATAC-seq is a widely-applied assay used to measure genome-wide chromatin accessibility; however, its ability to detect active regulatory regions can depend on the depth of sequencing coverage and the signal-to-noise ratio. Here we introduce AtacWorks, a deep learning toolkit to denoise sequencing coverage and identify regulatory peaks at base-pair resolution from low cell count, low-coverage, or low-quality ATAC-seq data. Models trained by AtacWorks can detect peaks from cell types not seen in the training data, and are generalizable across diverse sample preparations and experimental platforms. We demonstrate that AtacWorks enhances the sensitivity of single-cell experiments by producing results on par with those of conventional methods using ~10 times as many cells, and further show that this framework can be adapted to enable cross-modality inference of protein-DNA interactions. Finally, we establish that AtacWorks can enable new biological discoveries by identifying active regulatory regions associated with lineage priming in rare subpopulations of hematopoietic stem cells.
ATAC-seq 是一种广泛应用的检测全基因组染色质可及性的方法;然而,其检测活性调控区域的能力可能取决于测序覆盖深度和信号与噪声比。在这里,我们介绍了 AtacWorks,这是一个深度学习工具包,可从低细胞计数、低覆盖度或低质量的 ATAC-seq 数据中对测序覆盖度进行去噪,并以碱基对分辨率识别调控峰。由 AtacWorks 训练的模型可以检测到在训练数据中未见到的细胞类型的峰,并且在不同的样本制备和实验平台上具有通用性。我们证明,AtacWorks 通过使用大约 10 倍数量的细胞来产生与传统方法相当的结果,从而提高了单细胞实验的灵敏度,并进一步表明该框架可以进行适应性调整,以实现跨模态的蛋白质-DNA 相互作用推断。最后,我们确定 AtacWorks 可以通过鉴定与造血干细胞中稀有亚群的谱系启动相关的活性调控区域来实现新的生物学发现。