Computational Neurogenomics, D-HEST Institute for Neurosciences, Zürich, Switzerland.
Systems Neuroscience, D-HEST Institute for Neurosciences, Zürich, Switzerland.
PLoS Comput Biol. 2024 Oct 23;20(10):e1011971. doi: 10.1371/journal.pcbi.1011971. eCollection 2024 Oct.
ATAC-seq has emerged as a rich epigenome profiling technique, and is commonly used to identify Transcription Factors (TFs) underlying given phenomena. A number of methods can be used to identify differentially-active TFs through the accessibility of their DNA-binding motif, however little is known on the best approaches for doing so. Here we benchmark several such methods using a combination of curated datasets with various forms of short-term perturbations on known TFs, as well as semi-simulations. We include both methods specifically designed for this type of data as well as some that can be repurposed for it. We also investigate variations to these methods, and identify three particularly promising approaches (a chromVAR-limma workflow with critical adjustments, monaLisa and a combination of GC smooth quantile normalization and multivariate modeling). We further investigate the specific use of nucleosome-free fragments, the combination of top methods, and the impact of technical variation. Finally, we illustrate the use of the top methods on a novel dataset to characterize the impact on DNA accessibility of TRAnscription Factor TArgeting Chimeras (TRAFTAC), which can deplete TFs-in our case NFkB-at the protein level.
ATAC-seq 已成为一种丰富的表观基因组分析技术,常用于鉴定给定现象背后的转录因子 (TFs)。有许多方法可以通过其 DNA 结合基序的可及性来识别差异活性的 TFs,但是对于最佳方法知之甚少。在这里,我们使用经过精心整理的数据集以及针对已知 TFs 的各种短期扰动的半模拟组合,对几种这样的方法进行基准测试。我们包括专门为此类数据设计的方法以及一些可重新用于此类数据的方法。我们还研究了这些方法的变化,并确定了三种特别有前途的方法(具有关键调整的 chromVAR-limma 工作流程、monaLisa 和 GC 平滑分位数归一化与多元建模的组合)。我们进一步研究了无核小体片段的具体用途、顶级方法的组合以及技术差异的影响。最后,我们在一个新的数据集上展示了顶级方法的使用情况,以描述 TRAnscription Factor TArgeting Chimeras (TRAFTAC) 对 DNA 可及性的影响,TRAFTAC 可以在我们的情况下在蛋白质水平上耗尽 TFs(NFkB)。