Graduate Program of Bioinformatics and Systems Biology, University of California at San Diego, La Jolla, CA, USA.
Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, CA, USA.
Bioinformatics. 2019 Sep 15;35(18):3287-3293. doi: 10.1093/bioinformatics/btz079.
Increasing evidence has shown that nucleotide modifications such as methylation and hydroxymethylation on cytosine would greatly impact the binding of transcription factors (TFs). However, there is a lack of motif finding algorithms with the function to search for motifs with modified bases. In this study, we expand on our previous motif finding pipeline Epigram to provide systematic de novo motif discovery and performance evaluation on methylated DNA motifs.
mEpigram outperforms both MEME and DREME on finding modified motifs in simulated data that mimics various motif enrichment scenarios. Furthermore we were able to identify methylated motifs in Arabidopsis DNA affinity purification sequencing (DAP-seq) data that were previously demonstrated to contain such motifs. When applied to TF ChIP-seq and DNA methylome data in H1 and GM12878, our method successfully identified novel methylated motifs that can be recognized by the TFs or their co-factors. We also observed spacing constraint between the canonical motif of the TF and the newly discovered methylated motifs, which suggests operative recognition of these cis-elements by collaborative proteins.
The mEpigram program is available at http://wanglab.ucsd.edu/star/mEpigram.
Supplementary data are available at Bioinformatics online.
越来越多的证据表明,胞嘧啶上的核苷酸修饰,如甲基化和羟甲基化,会极大地影响转录因子(TFs)的结合。然而,目前缺乏具有搜索修饰碱基基序功能的 motif 发现算法。在这项研究中,我们扩展了之前的 motif 发现管道 Epigram,以提供针对甲基化 DNA 基序的系统从头 motif 发现和性能评估。
mEpigram 在模拟数据中寻找修饰基序的表现优于 MEME 和 DREME,这些模拟数据模拟了各种 motif 富集场景。此外,我们能够在先前证明含有此类基序的拟南芥 DNA 亲和纯化测序(DAP-seq)数据中识别出甲基化基序。当应用于 H1 和 GM12878 中的 TF ChIP-seq 和 DNA 甲基化组数据时,我们的方法成功地识别出了新的甲基化基序,这些基序可以被 TF 或其共同因子识别。我们还观察到 TF 的典型基序和新发现的甲基化基序之间存在间隔约束,这表明这些顺式元件可以通过协作蛋白进行操作性识别。
mEpigram 程序可在 http://wanglab.ucsd.edu/star/mEpigram 获得。
补充数据可在“Bioinformatics”在线获取。