Hein J
NIEHS, Research Triangle Park, North Carolina 27709.
Mol Biol Evol. 1989 Nov;6(6):649-68. doi: 10.1093/oxfordjournals.molbev.a040577.
Among the fundamental problems in molecular evolution and in the analysis of homologous sequences are alignment, phylogeny reconstruction, and the reconstruction of ancestral sequences. This paper presents a fast, combined solution to these problems. The new algorithm gives an approximation to the minimal history in terms of a distance function on sequences. The distance function on sequences is a minimal weighted path length constructed from substitutions and insertions-deletions of segments of any length. Substitutions are weighted with an arbitrary metric on the set of nucleotides or amino acids, and indels are weighted with a gap penalty function of the form gk = a + (bxk), where k is the length of the indel and a and b are two positive numbers. A novel feature is the introduction of the concept of sequence graphs and a generalization of the traditional dynamic sequence comparison algorithm to the comparison of sequence graphs. Sequence graphs ease several computational problems. They are used to represent large sets of sequences that can then be compared simultaneously. Furthermore, they allow the handling of multiple, equally good, alignments, where previous methods were forced to make arbitrary choices. A program written in C implemented this method; it was tested first on 22 5S RNA sequences.
分子进化和同源序列分析中的基本问题包括序列比对、系统发育重建和祖先序列重建。本文提出了针对这些问题的快速综合解决方案。新算法根据序列上的距离函数给出了对最小历史的近似。序列上的距离函数是由任意长度片段的替换和插入-缺失构建的最小加权路径长度。替换用核苷酸或氨基酸集上的任意度量加权,插入缺失用形式为gk = a + (bxk)的间隙罚分函数加权,其中k是插入缺失的长度,a和b是两个正数。一个新颖的特点是引入了序列图的概念,并将传统的动态序列比较算法推广到序列图的比较。序列图简化了几个计算问题。它们用于表示然后可以同时进行比较的大量序列集。此外,它们允许处理多个同样好的比对,而以前的方法则被迫做出任意选择。用C语言编写的一个程序实现了这种方法;它首先在22个5S RNA序列上进行了测试。