序列组之间的最优比对及其在多序列比对中的应用。

Optimal alignment between groups of sequences and its application to multiple sequence alignment.

作者信息

Gotoh O

机构信息

Department of Biochemistry, Saitama Cancer Center, Japan.

出版信息

Comput Appl Biosci. 1993 Jun;9(3):361-70. doi: 10.1093/bioinformatics/9.3.361.

DOI:10.1093/bioinformatics/9.3.361

PMID:8324637

Abstract

Four algorithms, A-D, were developed to align two groups of biological sequences. Algorithm A is equivalent to the conventional dynamic programming method widely used for aligning ordinary sequences, whereas algorithms B-D are designed to evaluate the cost for a deletion/insertion more accurately when internal gaps are present in either or both groups of sequences. Rigorous optimization of the 'sum of pairs' (SP) score is achieved by algorithm D, whose average performance is close to O(MNL2), where M and N are numbers of sequences included in the two groups and L is the mean length of the sequences. Algorithm B uses some approximations to cope with profile-based operations, whereas algorithm C is a simpler variant of algorithm D. These group-to-group alignment algorithms were applied to multiple sequence alignment with two iterative strategies: a progressive method based on a given binary tree and a randomized grouping--realignment method. The advantages and disadvantages of the four algorithms are discussed on the basis of the results of examinations of several protein families.

摘要

开发了四种算法A - D，用于比对两组生物序列。算法A等同于广泛用于比对普通序列的传统动态规划方法，而算法B - D旨在当两组序列中的一组或两组存在内部空位时，更准确地评估缺失/插入的代价。算法D实现了对“配对总和”（SP）分数的严格优化，其平均性能接近O(MNL2)，其中M和N是两组中包含的序列数量，L是序列的平均长度。算法B使用一些近似方法来处理基于轮廓的操作，而算法C是算法D的一个更简单变体。这些组间比对算法通过两种迭代策略应用于多序列比对：一种基于给定二叉树的渐进方法和一种随机分组 - 重新比对方法。基于对几个蛋白质家族的检测结果，讨论了这四种算法的优缺点。