Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe), Université Libre de Bruxelles CP 263, Campus Plaine, Boulevard du Triomphe, B-1050 Bruxelles, Belgium.
Bioinformatics. 2009 Oct 15;25(20):2715-22. doi: 10.1093/bioinformatics/btp490. Epub 2009 Aug 18.
Discovering cis-regulatory elements in genome sequence remains a challenging issue. Several methods rely on the optimization of some target scoring function. The information content (IC) or relative entropy of the motif has proven to be a good estimator of transcription factor DNA binding affinity. However, these information-based metrics are usually used as a posteriori statistics rather than during the motif search process itself.
We introduce here info-gibbs, a Gibbs sampling algorithm that efficiently optimizes the IC or the log-likelihood ratio (LLR) of the motif while keeping computation time low. The method compares well with existing methods like MEME, BioProspector, Gibbs or GAME on both synthetic and biological datasets. Our study shows that motif discovery techniques can be enhanced by directly focusing the search on the motif IC or the motif LLR.
在基因组序列中发现顺式调控元件仍然是一个具有挑战性的问题。几种方法依赖于一些目标评分函数的优化。已经证明,基序的信息含量 (IC) 或相对熵是转录因子 DNA 结合亲和力的良好估计值。然而,这些基于信息的指标通常被用作事后统计,而不是在基序搜索过程本身中使用。
我们在这里引入了 info-gibbs,这是一种 Gibbs 抽样算法,可有效地优化基序的 IC 或对数似然比 (LLR),同时保持计算时间低。该方法在合成和生物数据集上与 MEME、BioProspector、Gibbs 或 GAME 等现有方法相比表现良好。我们的研究表明,通过直接将搜索集中在基序 IC 或基序 LLR 上,可以增强基序发现技术。