Shida Kazuhito
Tohoku University Biomedical Engineering Research Organization, Sendai 980-8575, Japan.
Genome Inform. 2006;17(2):3-13.
The difficulties of computational discovery of transcription factor binding sites (TFBS) are well represented by (l, d) planted motif challenge problems. Large d problems are difficult, particularly for profile-based motif discovery algorithms. Their local search in the profile space is apparently incompatible with subtle motifs and large mutational distances between the motif occurrences. Herein, an improved profile-based method called GibbsDST is described and tested on (15,4), (12,3), and (18,6) challenging problems. For the first time for a profile-based method, its performance in motif challenge problems is comparable to that of Random Projection. It is noteworthy that GibbsDST outperforms a pattern-based algorithm, WINNOWER, in some cases. Effectiveness of GibbsDST using a biological dataset as an example and its possible extension to more realistic evolution models are also introduced.
转录因子结合位点(TFBS)的计算发现难题在(l,d)植入基序挑战问题中得到了充分体现。大d值的问题很难解决,尤其是对于基于轮廓的基序发现算法而言。它们在轮廓空间中的局部搜索显然与微妙的基序以及基序出现之间的大突变距离不兼容。本文描述了一种改进的基于轮廓的方法GibbsDST,并在(15,4)、(12,3)和(18,6)挑战问题上进行了测试。对于基于轮廓的方法来说,这是首次其在基序挑战问题中的性能与随机投影相当。值得注意的是,在某些情况下,GibbsDST优于基于模式的算法WINNOWER。还介绍了以生物数据集为例的GibbsDST的有效性及其对更现实进化模型的可能扩展。