Sinha S, Tompa M
Department of Computer Science and Engineering, University of Washington, Seattle 98195-2350, USA.
Proc Int Conf Intell Syst Mol Biol. 2000;8:344-54.
Understanding the mechanisms that determine the regulation of gene expression is an important and challenging problem. A fundamental subproblem is to identify DNA-binding sites for unknown regulatory factors, given a collection of genes believed to be coregulated, and given the noncoding DNA sequences near those genes. We present an enumerative statistical method for identifying good candidates for such transcription factor binding sites. Unlike local search techniques such as Expectation Maximization and Gibbs samplers that may not reach a global optimum, the method proposed here is guaranteed to produce the motifs with greatest z-scores. We discuss the results of experiments in which this algorithm was used to locate candidate binding sites in several well studied pathways of S. cerevisiae, as well as gene clusters from some of the hybridization microarray experiments.
理解决定基因表达调控的机制是一个重要且具有挑战性的问题。一个基本的子问题是,在给定一组被认为是共调控的基因以及这些基因附近的非编码DNA序列的情况下,识别未知调控因子的DNA结合位点。我们提出了一种枚举统计方法来识别此类转录因子结合位点的良好候选者。与诸如期望最大化和吉布斯采样器等可能无法达到全局最优的局部搜索技术不同,这里提出的方法保证能产生具有最大z分数的基序。我们讨论了使用该算法在酿酒酵母的几个深入研究的途径中定位候选结合位点的实验结果,以及一些杂交微阵列实验中的基因簇。