Sun Chunxiao, Huo Hongwei, Yu Qiang, Guo Haitao, Sun Zhigang
School of Computer Science and Technology, Xidian University, Xi'an 710071, China.
Biomed Res Int. 2015;2015:853461. doi: 10.1155/2015/853461. Epub 2015 Aug 10.
The planted (l, d) motif search (PMS) is one of the fundamental problems in bioinformatics, which plays an important role in locating transcription factor binding sites (TFBSs) in DNA sequences. Nowadays, identifying weak motifs and reducing the effect of local optimum are still important but challenging tasks for motif discovery. To solve the tasks, we propose a new algorithm, APMotif, which first applies the Affinity Propagation (AP) clustering in DNA sequences to produce informative and good candidate motifs and then employs Expectation Maximization (EM) refinement to obtain the optimal motifs from the candidate motifs. Experimental results both on simulated data sets and real biological data sets show that APMotif usually outperforms four other widely used algorithms in terms of high prediction accuracy.
植入式(l,d)基序搜索(PMS)是生物信息学中的基本问题之一,在定位DNA序列中的转录因子结合位点(TFBS)方面发挥着重要作用。如今,识别弱基序并减少局部最优的影响仍然是基序发现中重要但具有挑战性的任务。为了解决这些任务,我们提出了一种新算法APMotif,该算法首先在DNA序列中应用亲和传播(AP)聚类来生成信息丰富且良好的候选基序,然后采用期望最大化(EM)细化从候选基序中获得最优基序。在模拟数据集和真实生物数据集上的实验结果表明,APMotif在预测准确性方面通常优于其他四种广泛使用的算法。