Department of Biochemistry, University at Buffalo, Buffalo, NY 14203, USA.
Enhanced Pharmacodynamics LLC, Buffalo, NY 14203, USA.
Nucleic Acids Res. 2019 Sep 19;47(16):e91. doi: 10.1093/nar/gkz533.
ATAC-seq has been widely adopted to identify accessible chromatin regions across the genome. However, current data analysis still utilizes approaches initially designed for ChIP-seq or DNase-seq, without considering the transposase digested DNA fragments that contain additional nucleosome positioning information. We present the first dedicated ATAC-seq analysis tool, a semi-supervised machine learning approach named HMMRATAC. HMMRATAC splits a single ATAC-seq dataset into nucleosome-free and nucleosome-enriched signals, learns the unique chromatin structure around accessible regions, and then predicts accessible regions across the entire genome. We show that HMMRATAC outperforms the popular peak-calling algorithms on published human ATAC-seq datasets. We find that single-end sequenced or size-selected ATAC-seq datasets result in a loss of sensitivity compared to paired-end datasets without size-selection.
ATAC-seq 已被广泛用于识别整个基因组中可及的染色质区域。然而,当前的数据分析仍然利用最初为 ChIP-seq 或 DNase-seq 设计的方法,而没有考虑包含额外核小体定位信息的转座酶消化的 DNA 片段。我们提出了第一个专用的 ATAC-seq 分析工具,一种名为 HMMRATAC 的半监督机器学习方法。HMMRATAC 将单个 ATAC-seq 数据集分为无核小体和富含核小体的信号,学习可及区域周围独特的染色质结构,然后预测整个基因组中的可及区域。我们表明,HMMRATAC 在已发表的人类 ATAC-seq 数据集上优于流行的峰呼叫算法。我们发现,与未进行大小选择的配对末端数据集相比,单末端测序或大小选择的 ATAC-seq 数据集导致灵敏度降低。