Morgenstern B, Dress A, Werner T
National Research Center for Environment and Health, Institute of Mammalian Genetics, Neuherberg, Germany.
Proc Natl Acad Sci U S A. 1996 Oct 29;93(22):12098-103. doi: 10.1073/pnas.93.22.12098.
In this paper, a new way to think about, and to construct, pairwise as well as multiple alignments of DNA and protein sequences is proposed. Rather than forcing alignments to either align single residues or to introduce gaps by defining an alignment as a path running right from the source up to the sink in the associated dot-matrix diagram, we propose to consider alignments as consistent equivalence relations defined on the set of all positions occurring in all sequences under consideration. We also propose constructing alignments from whole segments exhibiting highly significant overall similarity rather than by aligning individual residues. Consequently, we present an alignment algorithm that (i) is based on segment-to-segment comparison instead of the commonly used residue-to-residue comparison and which (ii) avoids the well-known difficulties concerning the choice of appropriate gap penalties: gaps are not treated explicity, but remain as those parts of the sequences that do not belong to any of the aligned segments. Finally, we discuss the application of our algorithm to two test examples and compare it with commonly used alignment methods. As a first example, we aligned a set of 11 DNA sequences coding for functional helix-loop-helix proteins. Though the sequences show only low overall similarity, our program correctly aligned all of the 11 functional sites, which was a unique result among the methods tested. As a by-product, the reading frames of the sequences were identified. Next, we aligned a set of ribonuclease H proteins and compared our results with alignments produced by other programs as reported by McClure et al. [McClure, M. A., Vasi, T. K. & Fitch, W. M. (1994) Mol. Biol. Evol. 11, 571-592]. Our program was one of the best scoring programs. However, in contrast to other methods, our protein alignments are independent of user-defined parameters.
本文提出了一种思考和构建DNA及蛋白质序列两两比对和多序列比对的新方法。我们不再强制比对要么对齐单个残基,要么通过将比对定义为关联点阵图中从起点到终点的路径来引入空位,而是建议将比对视为在所有考虑序列中出现的所有位置集合上定义的一致等价关系。我们还建议从整体上具有高度显著相似性的片段构建比对,而不是通过对齐单个残基来构建。因此,我们提出了一种比对算法,该算法:(i)基于片段与片段的比较而非常用的残基与残基的比较;(ii)避免了与选择合适空位罚分相关的众所周知的困难:空位不被明确处理,而是保留为序列中不属于任何已比对片段的那些部分。最后,我们讨论了我们的算法在两个测试示例中的应用,并将其与常用的比对方法进行了比较。作为第一个示例,我们对齐了一组编码功能性螺旋-环-螺旋蛋白的11个DNA序列。尽管这些序列整体相似性较低,但我们的程序正确对齐了所有11个功能位点,这在测试的方法中是独一无二的结果。作为副产品,还识别出了这些序列的阅读框。接下来,我们对齐了一组核糖核酸酶H蛋白,并将我们的结果与McClure等人[McClure, M. A., Vasi, T. K. & Fitch, W. M. (1994) Mol. Biol. Evol. 11, 571 - 592]报道的其他程序产生的比对结果进行了比较。我们的程序是得分最高的程序之一。然而,与其他方法不同的是,我们的蛋白质比对独立于用户定义的参数。