Wei Zhi, Jensen Shane T
Genomics and Computational Biology Graduate Group, University of Pennsylvania School of Medicine Philadelphia, 19104, USA.
Bioinformatics. 2006 Jul 1;22(13):1577-84. doi: 10.1093/bioinformatics/btl147. Epub 2006 Apr 21.
Identification of a transcription factor binding sites is an important aspect of the analysis of genetic regulation. Many programs have been developed for the de novo discovery of a binding motif (collection of binding sites). Recently, a scoring function formulation was derived that allows for the comparison of discovered motifs from different programs [S.T. Jensen, X.S. Liu, Q. Zhou and J.S. Liu (2004) Stat. Sci., 19, 188-204.] A simple program, BioOptimizer, was proposed in [S.T. Jensen and J.S. Liu (2004) Bioinformatics, 20, 1557-1564.] that improved discovered motifs by optimizing a scoring function. However, BioOptimizer is a very simple algorithm that can only make local improvements upon an already discovered motif and so BioOptimizer can only be used in conjunction with other motif-finding software.
We introduce software, GAME, which utilizes a genetic algorithm to find optimal motifs in DNA sequences. GAME evolves motifs with high fitness from a population of randomly generated starting motifs, which eliminate the reliance on additional motif-finding programs. In addition to using standard genetic operations, GAME also incorporates two additional operators that are specific to the motif discovery problem. We demonstrate the superior performance of GAME compared with MEME, BioProspector and BioOptimizer in simulation studies as well as several real data applications where we use an extended version of the GAME algorithm that allows the motif width to be unknown.
转录因子结合位点的识别是基因调控分析的一个重要方面。已经开发了许多程序用于从头发现结合基序(结合位点的集合)。最近,推导了一种评分函数公式,可用于比较不同程序发现的基序 [S.T. 詹森、X.S. 刘、Q. 周和 J.S. 刘(2004 年)《统计科学》,19,188 - 204]。在 [S.T. 詹森和 J.S. 刘(2004 年)《生物信息学》,20,1557 - 1564] 中提出了一个简单的程序 BioOptimizer,它通过优化评分函数来改进发现的基序。然而,BioOptimizer 是一种非常简单的算法,只能对已经发现的基序进行局部改进,因此 BioOptimizer 只能与其他基序查找软件结合使用。
我们介绍了软件 GAME,它利用遗传算法在 DNA 序列中找到最优基序。GAME 从随机生成的起始基序群体中进化出具有高适应性的基序,从而消除了对其他基序查找程序的依赖。除了使用标准的遗传操作外,GAME 还纳入了两个特定于基序发现问题的额外算子。在模拟研究以及几个实际数据应用中,我们展示了 GAME 与 MEME、BioProspector 和 BioOptimizer 相比具有卓越的性能,在这些应用中我们使用了 GAME 算法的扩展版本,该版本允许基序宽度未知。