Nguyen Ken D, Pan Yi
Department of Computer Science, Georgia State University, Atlanta, GA 30303-3994, USA.
Int J Bioinform Res Appl. 2011;7(2):168-82. doi: 10.1504/IJBRA.2011.040095.
Aligning multiple DNA/RNA/protein sequences to identify common functionalities, structures, or relationships between species is a fundamental task in bioinformatics. In this study, we propose a new multiple sequence strategy that extracts sequence information, sequence global and local similarities to provide different weights for each input sequence. A weighted pair-wise distance matrix is calculated from these sequences to build a dynamic alignment guiding tree. The tree can reorder its higher-level branches based on corresponding alignment results from lower tree levels to guarantee the highest alignment scores at each level of the tree. This technique improves the alignment accuracy up to 10% on many benchmarks tested against alignment tools such as CLUSTALW (Thompson et al., 1994), DIALIGN (Morgenstern, 1999), T-COFFEE (Notredame et al., 2000), MUSCLE (Edgar, 2004), and PROBCONS (Do et al., 2005) of the multiple sequence alignment.
比对多个DNA/RNA/蛋白质序列以识别物种之间的共同功能、结构或关系是生物信息学中的一项基本任务。在本研究中,我们提出了一种新的多序列策略,该策略提取序列信息、序列全局和局部相似性,以便为每个输入序列提供不同的权重。根据这些序列计算加权成对距离矩阵,以构建动态比对引导树。该树可以根据较低树层的相应比对结果对其较高层分支进行重新排序,以确保树的每个层级都具有最高的比对分数。在针对诸如CLUSTALW(Thompson等人,1994年)、DIALIGN(Morgenstern,1999年)、T-COFFEE(Notredame等人,2000年)、MUSCLE(Edgar,2004年)和PROBCONS(Do等人,2005年)等多序列比对工具进行测试的许多基准上,该技术将比对准确率提高了10%。