Sankoff D, Cedergren R J, Lapalme G
J Mol Evol. 1976 Mar 29;7(2):133-49. doi: 10.1007/BF01732471.
The problem of choosing an alignment of two or more nucleotide sequences is particularly difficult for nucleic acids, such as 5S ribosomal RNA, which do not code for protein and for which secondary structure is unknown. Given a set of 'costs' for the various types of replacement mutations and for base insertion or deletion, we present a dynamic programming algorithm which finds the optimal (least costly) alignment for a set of N sequences simultaneously, where each sequence is associated with one of the N tips of a given evolutionary tree. Concurrently, protosequences are constructed corresponding to the ancestral nodes of the tree. A version of this algorithm, modified to be computationally feasible, is implemented to align the sequences of 5S RNA from nine organisms. Complete sets of alignments and protosequence reconstructions are done for a large number of different configurations of mutation costs. Examination of the family of curbes of total replacements inferred versus the ratio of transitions/transversions inferred, each curve corresponding to a given number of insertions-deletions inferred, provides a method for estimating relative costs and relative frequencies for these different types of mutations.
对于诸如5S核糖体RNA这类不编码蛋白质且二级结构未知的核酸而言,选择两个或更多核苷酸序列的比对问题尤为困难。给定各种类型替换突变以及碱基插入或缺失的“代价”,我们提出一种动态规划算法,该算法能同时为一组N个序列找到最优(代价最小)比对,其中每个序列与给定进化树的N个末端之一相关联。同时,构建与树的祖先节点相对应的原始序列。该算法的一个经过修改以使其在计算上可行的版本被用于比对来自9种生物体的5S RNA序列。针对大量不同的突变代价配置完成了完整的比对集和原始序列重建。对推断的总替换曲线族与推断的转换/颠换比率进行考察,每条曲线对应于推断的给定数量的插入 - 缺失,这提供了一种估计这些不同类型突变的相对代价和相对频率的方法。