CMPG, Department of Microbial and Molecular Systems, Katholieke Universiteit Leuven, Leuven, Belgium.
PLoS One. 2010 Feb 3;5(2):e8938. doi: 10.1371/journal.pone.0008938.
Computational de novo discovery of transcription factor binding sites is still a challenging problem. The growing number of sequenced genomes allows integrating orthology evidence with coregulation information when searching for motifs. Moreover, the more advanced motif detection algorithms explicitly model the phylogenetic relatedness between the orthologous input sequences and thus should be well adapted towards using orthologous information. In this study, we evaluated the conditions under which complementing coregulation with orthologous information improves motif detection for the class of probabilistic motif detection algorithms with an explicit evolutionary model.
We designed datasets (real and synthetic) covering different degrees of coregulation and orthologous information to test how well Phylogibbs and Phylogenetic sampler, as representatives of the motif detection algorithms with evolutionary model performed as compared to MEME, a more classical motif detection algorithm that treats orthologs independently.
Under certain conditions detecting motifs in the combined coregulation-orthology space is indeed more efficient than using each space separately, but this is not always the case. Moreover, the difference in success rate between the advanced algorithms and MEME is still marginal. The success rate of motif detection depends on the complex interplay between the added information and the specificities of the applied algorithms. Insights in this relation provide information useful to both developers and users. All benchmark datasets are available at http://homes.esat.kuleuven.be/~kmarchal/Supplementary_Storms_Valerie_PlosONE.
计算从头发现转录因子结合位点仍然是一个具有挑战性的问题。随着测序基因组数量的增加,在搜索基序时,可以将同源证据与共调控信息集成在一起。此外,更先进的基序检测算法明确地对同源输入序列之间的系统发育相关性进行建模,因此应该很好地适应使用同源信息。在这项研究中,我们评估了在哪些条件下,通过补充共调控信息来改善具有显式进化模型的概率基序检测算法的基序检测。
我们设计了数据集(真实和合成),涵盖了不同程度的共调控和同源信息,以测试 Phylogibbs 和 Phylogenetic sampler 作为具有进化模型的基序检测算法的代表,与 MEME 相比,在检测基序方面的表现如何,MEME 是一种更经典的基序检测算法,它独立地处理同源物。
在某些条件下,在共调控-同源空间中检测基序确实比分别使用每个空间更有效,但情况并非总是如此。此外,先进算法与 MEME 之间的成功率差异仍然很小。基序检测的成功率取决于添加信息与应用算法的特异性之间的复杂相互作用。对这种关系的深入了解为开发人员和用户提供了有用的信息。所有基准数据集均可在 http://homes.esat.kuleuven.be/~kmarchal/Supplementary_Storms_Valerie_PlosONE 上获得。