Wilczynski Bartek, Dojer Norbert, Patelak Mateusz, Tiuryn Jerzy
Institute of Informatics, University of Warsaw, Warsaw, Poland.
BMC Bioinformatics. 2009 Mar 10;10:82. doi: 10.1186/1471-2105-10-82.
Finding functional regulatory elements in DNA sequences is a very important problem in computational biology and providing a reliable algorithm for this task would be a major step towards understanding regulatory mechanisms on genome-wide scale. Major obstacles in this respect are that the fact that the amount of non-coding DNA is vast, and that the methods for predicting functional transcription factor binding sites tend to produce results with a high percentage of false positives. This makes the problem of finding regions significantly enriched in binding sites difficult.
We develop a novel method for predicting regulatory regions in DNA sequences, which is designed to exploit the evolutionary conservation of regulatory elements between species without assuming that the order of motifs is preserved across species. We have implemented our method and tested its predictive abilities on various datasets from different organisms.
We show that our approach enables us to find a majority of the known CRMs using only sequence information from different species together with currently publicly available motif data. Also, our method is robust enough to perform well in predicting CRMs, despite differences in tissue specificity and even across species, provided that the evolutionary distances between compared species do not change substantially. The complexity of the proposed algorithm is polynomial, and the observed running times show that it may be readily applied.
在DNA序列中寻找功能调控元件是计算生物学中的一个非常重要的问题,为该任务提供一种可靠的算法将是朝着在全基因组范围内理解调控机制迈出的重要一步。这方面的主要障碍是,非编码DNA的数量巨大,而且预测功能性转录因子结合位点的方法往往会产生高比例的假阳性结果。这使得寻找结合位点显著富集区域的问题变得困难。
我们开发了一种预测DNA序列中调控区域的新方法,该方法旨在利用物种间调控元件的进化保守性,而不假设基序顺序在物种间保持不变。我们已经实现了我们的方法,并在来自不同生物体的各种数据集上测试了其预测能力。
我们表明,我们的方法使我们能够仅使用来自不同物种的序列信息以及当前公开可用的基序数据来找到大多数已知的顺式调控模块。此外,我们的方法足够稳健,即使在组织特异性不同甚至跨物种的情况下,只要所比较物种之间的进化距离没有实质性变化,也能在预测顺式调控模块方面表现良好。所提出算法的复杂度是多项式的,观察到的运行时间表明它可以很容易地应用。