Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA.
Bioinformatics and Integrative Genomics PhD Program, Harvard University, Cambridge, Massachusetts 02138, USA.
Genome Res. 2020 May;30(5):736-748. doi: 10.1101/gr.260877.120. Epub 2020 May 18.
Deciphering the interplay between chromatin accessibility and transcription factor (TF) binding is fundamental to understanding transcriptional regulation, control of cellular states, and the establishment of new phenotypes. Recent genome-wide chromatin accessibility profiling studies have provided catalogs of putative open regions, where TFs can recognize their motifs and regulate gene expression programs. Here, we present motif enrichment in differential elements of accessibility (MEDEA), a computational tool that analyzes high-throughput chromatin accessibility genomic data to identify cell-type-specific accessible regions and lineage-specific motifs associated with TF binding therein. To benchmark MEDEA, we used a panel of reference cell lines profiled by ENCODE and curated by the ENCODE Project Consortium for the ENCODE-DREAM Challenge. By comparing results with RNA-seq data, ChIP-seq peaks, and DNase-seq footprints, we show that MEDEA improves the detection of motifs associated with known lineage specifiers. We then applied MEDEA to 610 ENCODE DNase-seq data sets, where it revealed significant motifs even when absolute enrichment was low and where it identified novel regulators, such as NRF1 in kidney development. Finally, we show that MEDEA performs well on both bulk and single-cell ATAC-seq data. MEDEA is publicly available as part of our Glossary-GENRE suite for motif enrichment analysis.
解析染色质可及性和转录因子 (TF) 结合之间的相互作用对于理解转录调控、细胞状态的控制以及新表型的建立至关重要。最近的全基因组染色质可及性分析研究提供了假定的开放区域目录,TF 可以在这些区域中识别它们的基序并调节基因表达程序。在这里,我们提出了 motif enrichment in differential elements of accessibility (MEDEA),这是一种分析高通量染色质可及性基因组数据以识别细胞类型特异性可及区域和与 TF 结合相关的谱系特异性基序的计算工具。为了对 MEDEA 进行基准测试,我们使用了 ENCODE 项目联盟为 ENCODE-DREAM 挑战赛编辑的一组由 ENCODE 图谱的参考细胞系进行了图谱绘制。通过将结果与 RNA-seq 数据、ChIP-seq 峰和 DNase-seq 足迹进行比较,我们表明 MEDEA 提高了与已知谱系标记物相关的基序的检测。然后,我们将 MEDEA 应用于 610 个 ENCODE DNase-seq 数据集,即使绝对富集水平较低,它也能揭示出重要的基序,并且能够识别出新型调节剂,例如肾脏发育中的 NRF1。最后,我们表明 MEDEA 在批量和单细胞 ATAC-seq 数据上都表现良好。MEDEA 作为我们 Glossary-GENRE 套件的一部分,可用于 motif enrichment 分析,供公众使用。