Ghent University, Department of Plant Biotechnology and Bioinformatics, 9052 Ghent, Belgium.
VIB Center for Plant Systems Biology, 9052 Ghent, Belgium.
Plant Physiol. 2019 Oct;181(2):412-425. doi: 10.1104/pp.19.00605. Epub 2019 Jul 25.
Determining where transcription factors (TFs) bind in genomes provides insight into which transcriptional programs are active across organs, tissue types, and environmental conditions. Recent advances in high-throughput profiling of regulatory DNA have yielded large amounts of information about chromatin accessibility. Interpreting the functional significance of these data sets requires knowledge of which regulators are likely to bind these regions. This can be achieved by using information about TF-binding preferences, or motifs, to identify TF-binding events that are likely to be functional. Although different approaches exist to map motifs to DNA sequences, a systematic evaluation of these tools in plants is missing. Here, we compare four motif-mapping tools widely used in the Arabidopsis () research community and evaluate their performance using chromatin immunoprecipitation data sets for 40 TFs. Downstream gene regulatory network (GRN) reconstruction was found to be sensitive to the motif mapper used. We further show that the low recall of Find Individual Motif Occurrences, one of the most frequently used motif-mapping tools, can be overcome by using an Ensemble approach, which combines results from different mapping tools. Several examples are provided demonstrating how the Ensemble approach extends our view on transcriptional control for TFs active in different biological processes. Finally, a protocol is presented to effectively derive more complete cell type-specific GRNs through the integrative analysis of open chromatin regions, known binding site information, and expression data sets. This approach will pave the way to increase our understanding of GRNs in different cellular conditions.
确定转录因子 (TFs) 在基因组中的结合位置,可以深入了解在不同器官、组织类型和环境条件下哪些转录程序是活跃的。最近,对调控 DNA 的高通量分析技术取得了大量关于染色质可及性的信息。要解释这些数据集的功能意义,需要了解哪些调节剂可能结合这些区域。这可以通过使用有关 TF 结合偏好或基序的信息来实现,以识别可能具有功能的 TF 结合事件。尽管存在将基序映射到 DNA 序列的不同方法,但在植物中缺少对这些工具的系统评估。在这里,我们比较了在拟南芥研究社区中广泛使用的四种基序映射工具,并使用 40 个 TF 的染色质免疫沉淀数据集评估了它们的性能。下游基因调控网络 (GRN) 重建被发现对使用的基序映射工具敏感。我们进一步表明,一种组合来自不同映射工具的结果的集成方法可以克服 Find Individual Motif Occurrences (最常用的基序映射工具之一)的召回率低的问题。提供了几个示例,展示了如何通过整合开放染色质区域、已知结合位点信息和表达数据集的分析,扩展我们对在不同生物过程中活跃的 TF 的转录控制的看法。最后,提出了一种通过整合分析开放染色质区域、已知结合位点信息和表达数据集,有效地推导更完整的细胞类型特异性 GRN 的方案。这种方法将为增加我们对不同细胞条件下的 GRN 的理解铺平道路。