基于动态加权引导树的多序列比对。

Multiple sequence alignment based on dynamic weighted guidance tree.

作者信息

Nguyen Ken D, Pan Yi

机构信息

Department of Computer Science, Georgia State University, Atlanta, GA 30303-3994, USA.

出版信息

Int J Bioinform Res Appl. 2011;7(2):168-82. doi: 10.1504/IJBRA.2011.040095.

DOI:10.1504/IJBRA.2011.040095

PMID:21576075

Abstract

Aligning multiple DNA/RNA/protein sequences to identify common functionalities, structures, or relationships between species is a fundamental task in bioinformatics. In this study, we propose a new multiple sequence strategy that extracts sequence information, sequence global and local similarities to provide different weights for each input sequence. A weighted pair-wise distance matrix is calculated from these sequences to build a dynamic alignment guiding tree. The tree can reorder its higher-level branches based on corresponding alignment results from lower tree levels to guarantee the highest alignment scores at each level of the tree. This technique improves the alignment accuracy up to 10% on many benchmarks tested against alignment tools such as CLUSTALW (Thompson et al., 1994), DIALIGN (Morgenstern, 1999), T-COFFEE (Notredame et al., 2000), MUSCLE (Edgar, 2004), and PROBCONS (Do et al., 2005) of the multiple sequence alignment.

摘要

比对多个DNA/RNA/蛋白质序列以识别物种之间的共同功能、结构或关系是生物信息学中的一项基本任务。在本研究中，我们提出了一种新的多序列策略，该策略提取序列信息、序列全局和局部相似性，以便为每个输入序列提供不同的权重。根据这些序列计算加权成对距离矩阵，以构建动态比对引导树。该树可以根据较低树层的相应比对结果对其较高层分支进行重新排序，以确保树的每个层级都具有最高的比对分数。在针对诸如CLUSTALW（Thompson等人，1994年）、DIALIGN（Morgenstern，1999年）、T-COFFEE（Notredame等人，2000年）、MUSCLE（Edgar，2004年）和PROBCONS（Do等人，2005年）等多序列比对工具进行测试的许多基准上，该技术将比对准确率提高了10%。