基于片段对片段比较的多DNA和蛋白质序列比对。

Multiple DNA and protein sequence alignment based on segment-to-segment comparison.

作者信息

Morgenstern B, Dress A, Werner T

机构信息

National Research Center for Environment and Health, Institute of Mammalian Genetics, Neuherberg, Germany.

出版信息

Proc Natl Acad Sci U S A. 1996 Oct 29;93(22):12098-103. doi: 10.1073/pnas.93.22.12098.

DOI:10.1073/pnas.93.22.12098

PMID:8901539

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC37949/

Abstract

In this paper, a new way to think about, and to construct, pairwise as well as multiple alignments of DNA and protein sequences is proposed. Rather than forcing alignments to either align single residues or to introduce gaps by defining an alignment as a path running right from the source up to the sink in the associated dot-matrix diagram, we propose to consider alignments as consistent equivalence relations defined on the set of all positions occurring in all sequences under consideration. We also propose constructing alignments from whole segments exhibiting highly significant overall similarity rather than by aligning individual residues. Consequently, we present an alignment algorithm that (i) is based on segment-to-segment comparison instead of the commonly used residue-to-residue comparison and which (ii) avoids the well-known difficulties concerning the choice of appropriate gap penalties: gaps are not treated explicity, but remain as those parts of the sequences that do not belong to any of the aligned segments. Finally, we discuss the application of our algorithm to two test examples and compare it with commonly used alignment methods. As a first example, we aligned a set of 11 DNA sequences coding for functional helix-loop-helix proteins. Though the sequences show only low overall similarity, our program correctly aligned all of the 11 functional sites, which was a unique result among the methods tested. As a by-product, the reading frames of the sequences were identified. Next, we aligned a set of ribonuclease H proteins and compared our results with alignments produced by other programs as reported by McClure et al. [McClure, M. A., Vasi, T. K. & Fitch, W. M. (1994) Mol. Biol. Evol. 11, 571-592]. Our program was one of the best scoring programs. However, in contrast to other methods, our protein alignments are independent of user-defined parameters.

摘要

本文提出了一种思考和构建DNA及蛋白质序列两两比对和多序列比对的新方法。我们不再强制比对要么对齐单个残基，要么通过将比对定义为关联点阵图中从起点到终点的路径来引入空位，而是建议将比对视为在所有考虑序列中出现的所有位置集合上定义的一致等价关系。我们还建议从整体上具有高度显著相似性的片段构建比对，而不是通过对齐单个残基来构建。因此，我们提出了一种比对算法，该算法：（i）基于片段与片段的比较而非常用的残基与残基的比较；（ii）避免了与选择合适空位罚分相关的众所周知的困难：空位不被明确处理，而是保留为序列中不属于任何已比对片段的那些部分。最后，我们讨论了我们的算法在两个测试示例中的应用，并将其与常用的比对方法进行了比较。作为第一个示例，我们对齐了一组编码功能性螺旋-环-螺旋蛋白的11个DNA序列。尽管这些序列整体相似性较低，但我们的程序正确对齐了所有11个功能位点，这在测试的方法中是独一无二的结果。作为副产品，还识别出了这些序列的阅读框。接下来，我们对齐了一组核糖核酸酶H蛋白，并将我们的结果与McClure等人[McClure, M. A., Vasi, T. K. & Fitch, W. M. (1994) Mol. Biol. Evol. 11, 571 - 592]报道的其他程序产生的比对结果进行了比较。我们的程序是得分最高的程序之一。然而，与其他方法不同的是，我们的蛋白质比对独立于用户定义的参数。

相似文献

Multiple DNA and protein sequence alignment based on segment-to-segment comparison.基于片段对片段比较的多DNA和蛋白质序列比对。

Proc Natl Acad Sci U S A. 1996 Oct 29;93(22):12098-103. doi: 10.1073/pnas.93.22.12098.

Using CLUSTAL for multiple sequence alignments.使用CLUSTAL进行多序列比对。

Methods Enzymol. 1996;266:383-402. doi: 10.1016/s0076-6879(96)66024-8.

transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.transAlign：利用氨基酸促进蛋白质编码DNA序列的多重比对。

BMC Bioinformatics. 2005 Jun 22;6:156. doi: 10.1186/1471-2105-6-156.

Accuracy of structure-based sequence alignment of automatic methods.自动方法的基于结构的序列比对准确性。

BMC Bioinformatics. 2007 Sep 20;8:355. doi: 10.1186/1471-2105-8-355.

CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.CLUSTAL W：通过序列加权、位置特异性空位罚分和权重矩阵选择提高渐进多序列比对的灵敏度。

Nucleic Acids Res. 1994 Nov 11;22(22):4673-80. doi: 10.1093/nar/22.22.4673.

Statistical evaluation and comparison of a pairwise alignment algorithm that a priori assigns the number of gaps rather than employing gap penalties.一种先验分配空位数量而非采用空位罚分的两两比对算法的统计评估与比较。

Bioinformatics. 2005 Apr 15;21(8):1421-8. doi: 10.1093/bioinformatics/bti198. Epub 2004 Dec 10.

Alignment of protein sequences by their profiles.通过蛋白质序列的图谱进行比对。

Protein Sci. 2004 Apr;13(4):1071-87. doi: 10.1110/ps.03379804.

Sigma: multiple alignment of weakly-conserved non-coding DNA sequence.西格玛：弱保守非编码DNA序列的多重比对

BMC Bioinformatics. 2006 Mar 16;7:143. doi: 10.1186/1471-2105-7-143.

PairWise and SearchWise: finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames.逐对比较与搜索比较：在将蛋白质谱与所有DNA翻译框架进行同步比较时找到最佳比对。

Nucleic Acids Res. 1996 Jul 15;24(14):2730-9. doi: 10.1093/nar/24.14.2730.

The performance of several multiple-sequence alignment programs in relation to secondary-structure features for an rRNA sequence.几个多序列比对程序针对一个rRNA序列的二级结构特征的性能。

Mol Biol Evol. 2000 Apr;17(4):530-9. doi: 10.1093/oxfordjournals.molbev.a026333.

引用本文的文献

Bioinformatic Prediction and High Throughput In Vivo Screening to Identify Cis-Regulatory Elements for the Development of Algal Synthetic Promoters.生物信息学预测和高通量体内筛选鉴定藻类合成启动子发育的顺式调控元件。

ACS Synth Biol. 2024 Jul 19;13(7):2150-2165. doi: 10.1021/acssynbio.4c00199. Epub 2024 Jul 10.

An overview of technologies for MS-based proteomics-centric multi-omics.基于 MS 的蛋白质组学中心型多组学技术概述。

Expert Rev Proteomics. 2022 Mar;19(3):165-181. doi: 10.1080/14789450.2022.2070476. Epub 2022 May 2.

TCP and MADS-Box Transcription Factor Networks Regulate Heteromorphic Flower Type Identity in .TCP 和 MADS-Box 转录因子网络调节 . 中的异型花型身份。

Plant Physiol. 2020 Nov;184(3):1455-1468. doi: 10.1104/pp.20.00702. Epub 2020 Sep 8.

Accurate multiple alignment of distantly related genome sequences using filtered spaced word matches as anchor points.使用过滤的间隔字匹配作为锚点，对远缘基因组序列进行精确的多重比对。

Bioinformatics. 2019 Jan 15;35(2):211-218. doi: 10.1093/bioinformatics/bty592.

PnpProbs: a better multiple sequence alignment tool by better handling of guide trees.PnpProbs：通过更好地处理引导树而成为更好的多序列比对工具。

BMC Bioinformatics. 2016 Aug 31;17 Suppl 8(Suppl 8):285. doi: 10.1186/s12859-016-1121-7.

An enhanced algorithm for multiple sequence alignment of protein sequences using genetic algorithm.一种使用遗传算法的蛋白质序列多序列比对增强算法。

EXCLI J. 2015 Dec 15;14:1232-55. doi: 10.17179/excli2015-302. eCollection 2015.

Aligning the unalignable: bacteriophage whole genome alignments.比对不可比对之物：噬菌体全基因组比对

BMC Bioinformatics. 2016 Jan 13;17:30. doi: 10.1186/s12859-015-0869-5.

Evolution of C4 photosynthesis in the genus flaveria: establishment of a photorespiratory CO2 pump.类芦属植物 C4 光合作用的进化：光呼吸 CO2 泵的建立。

Plant Cell. 2013 Jul;25(7):2522-35. doi: 10.1105/tpc.113.114520. Epub 2013 Jul 11.

DIALIGN at GOBICS--multiple sequence alignment using various sources of external information.DIALIGN 在 GOBICS 中的应用——使用多种外部信息源进行多重序列比对。

Nucleic Acids Res. 2013 Jul;41(Web Server issue):W3-7. doi: 10.1093/nar/gkt283. Epub 2013 Apr 24.

Vertical decomposition with Genetic Algorithm for Multiple Sequence Alignment.基于遗传算法的多序列比对垂直分解。

BMC Bioinformatics. 2011 Aug 25;12:353. doi: 10.1186/1471-2105-12-353.

本文引用的文献

Motif-biased protein sequence alignment.基序偏好性蛋白质序列比对

J Comput Biol. 1994 Winter;1(4):297-310. doi: 10.1089/cmb.1994.1.297.

Weighting in sequence space: a comparison of methods in terms of generalized sequences.序列空间中的加权：基于广义序列的方法比较。

Proc Natl Acad Sci U S A. 1993 Oct 1;90(19):8777-81. doi: 10.1073/pnas.90.19.8777.

Optimal alignment between groups of sequences and its application to multiple sequence alignment.序列组之间的最优比对及其在多序列比对中的应用。

Comput Appl Biosci. 1993 Jun;9(3):361-70. doi: 10.1093/bioinformatics/9.3.361.

Sequence alignment and penalty choice. Review of concepts, case studies and implications.序列比对与罚分选择。概念回顾、案例研究及影响

J Mol Biol. 1994 Jan 7;235(1):1-12. doi: 10.1016/s0022-2836(05)80006-3.

Comparative analysis of multiple protein-sequence alignment methods.多种蛋白质序列比对方法的比较分析

Mol Biol Evol. 1994 Jul;11(4):571-92. doi: 10.1093/oxfordjournals.molbev.a040138.

Protein Eng. 1994 Oct;7(10):1175-87. doi: 10.1093/protein/7.10.1175.

A general method applicable to the search for similarities in the amino acid sequence of two proteins.一种适用于寻找两种蛋白质氨基酸序列相似性的通用方法。

J Mol Biol. 1970 Mar;48(3):443-53. doi: 10.1016/0022-2836(70)90057-4.

Simultaneous comparison of three protein sequences.三种蛋白质序列的同步比较。

Proc Natl Acad Sci U S A. 1985 May;82(10):3073-7. doi: 10.1073/pnas.82.10.3073.

A flexible multiple sequence alignment program.一个灵活的多序列比对程序。

Nucleic Acids Res. 1988 Mar 11;16(5):1683-91. doi: 10.1093/nar/16.5.1683.

Progressive sequence alignment as a prerequisite to correct phylogenetic trees.渐进序列比对是构建正确系统发育树的前提条件。

J Mol Evol. 1987;25(4):351-60. doi: 10.1007/BF02603120.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。