Gonnet G H, Korostensky C, Benner S
Institute for Scientific Computing, ETH Zurich, Switzerland.
J Comput Biol. 2000 Feb-Apr;7(1-2):261-76. doi: 10.1089/10665270050081513.
Multiple sequence alignments (MSAs) are frequently used in the study of families of protein sequences or DNA/RNA sequences. They are a fundamental tool for the understanding of the structure, functionality and, ultimately, the evolution of proteins. A new algorithm, the Circular Sum (CS) method, is presented for formally evaluating the quality of an MSA. It is based on the use of a solution to the Traveling Salesman Problem, which identifies a circular tour through an evolutionary tree connecting the sequences in a protein family. With this approach, the calculation of an evolutionary tree and the errors that it would introduce can be avoided altogether. The algorithm gives an upper bound, the best score that can possibly be achieved by any MSA for a given set of protein sequences. Alternatively, if presented with a specific MSA, the algorithm provides a formal score for the MSA, which serves as an absolute measure of the quality of the MSA. The CS measure yields a direct connection between an MSA and the associated evolutionary tree. The measure can be used as a tool for evaluating different methods for producing MSAs. A brief example of the last application is provided. Because it weights all evolutionary events on a tree identically, but does not require the reconstruction of a tree, the CS algorithm has advantages over the frequently used sum-of-pairs measures for scoring MSAs, which weight some evolutionary events more strongly than others. Compared to other weighted sum-of-pairs measures, it has the advantage that no evolutionary tree must be constructed, because we can find a circular tour without knowing the tree.
多序列比对(MSA)在蛋白质序列家族或DNA/RNA序列的研究中经常被使用。它们是理解蛋白质结构、功能以及最终进化的基本工具。本文提出了一种新的算法,即循环和(CS)方法,用于正式评估MSA的质量。它基于旅行商问题的一种解决方案,该方案通过连接蛋白质家族中序列的进化树确定一条循环路径。通过这种方法,可以完全避免进化树的计算及其可能引入的误差。该算法给出了一个上限,即对于给定的一组蛋白质序列,任何MSA可能达到的最佳分数。或者,如果给出一个特定的MSA,该算法会为其提供一个正式分数,作为MSA质量的绝对度量。CS度量在MSA和相关进化树之间建立了直接联系。该度量可以用作评估生成MSA的不同方法的工具。最后给出了一个应用实例。由于它对树上所有进化事件的权重相同,但不需要重建树,因此CS算法相对于常用的成对和度量在评分MSA方面具有优势,后者对某些进化事件的权重比对其他事件更强。与其他加权成对和度量相比,它的优势在于无需构建进化树,因为我们可以在不知道树的情况下找到一条循环路径。