Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany.
PLoS Comput Biol. 2011 Feb 10;7(2):e1001070. doi: 10.1371/journal.pcbi.1001070.
Transcription factors are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in promoters. The de-novo discovery of transcription factor binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not been fully solved yet. Here, we present a de-novo motif discovery tool called Dispom for finding differentially abundant transcription factor binding sites that models existing positional preferences of binding sites and adjusts the length of the motif in the learning process. Evaluating Dispom, we find that its prediction performance is superior to existing tools for de-novo motif discovery for 18 benchmark data sets with planted binding sites, and for a metazoan compendium based on experimental data from micro-array, ChIP-chip, ChIP-DSL, and DamID as well as Gene Ontology data. Finally, we apply Dispom to find binding sites differentially abundant in promoters of auxin-responsive genes extracted from Arabidopsis thaliana microarray data, and we find a motif that can be interpreted as a refined auxin responsive element predominately positioned in the 250-bp region upstream of the transcription start site. Using an independent data set of auxin-responsive genes, we find in genome-wide predictions that the refined motif is more specific for auxin-responsive genes than the canonical auxin-responsive element. In general, Dispom can be used to find differentially abundant motifs in sequences of any origin. However, the positional distribution learned by Dispom is especially beneficial if all sequences are aligned to some anchor point like the transcription start site in case of promoter sequences. We demonstrate that the combination of searching for differentially abundant motifs and inferring a position distribution from the data is beneficial for de-novo motif discovery. Hence, we make the tool freely available as a component of the open-source Java framework Jstacs and as a stand-alone application at http://www.jstacs.de/index.php/Dispom.
转录因子是基因调控的主要组成部分,它们通过结合启动子中的特定结合位点来激活或抑制基因表达。通过湿实验获得的目标区域中转录因子结合位点的从头发现是计算生物学中的一个具有挑战性的问题,尚未得到完全解决。在这里,我们提出了一种称为 Dispom 的从头发现工具,用于发现差异丰富的转录因子结合位点,该工具模型了现有结合位点的位置偏好,并在学习过程中调整了 motif 的长度。通过评估 Dispom,我们发现它的预测性能优于现有的从头发现工具,对于 18 个具有种植结合位点的基准数据集,以及基于微阵列、ChIP-chip、ChIP-DSL 和 DamID 实验数据以及基因本体论数据的后生动物汇编,都是如此。最后,我们将 Dispom 应用于从拟南芥微阵列数据中提取的生长素响应基因启动子中差异丰富的结合位点的发现,我们找到了一个可以解释为主要位于转录起始位点上游 250bp 区域的精炼生长素反应元件的 motif。使用生长素响应基因的独立数据集,我们在全基因组预测中发现,与经典生长素反应元件相比,精炼 motif 对生长素响应基因更为特异。一般来说,Dispom 可以用于发现任何来源序列中差异丰富的 motif。然而,如果所有序列都像启动子序列那样对齐到某个锚点(如转录起始位点),那么 Dispom 学习到的位置分布尤其有益。我们证明,从数据中搜索差异丰富的 motif 和推断位置分布的组合有助于从头发现 motif。因此,我们将该工具作为开源 Java 框架 Jstacs 的一部分免费提供,并在 http://www.jstacs.de/index.php/Dispom 上作为独立应用程序提供。