蛋白质序列中的交换。

Fliess Amit, Motro Benny, Unger Ron

Faculty of Life Science, Bar-Ilan University, Ramat-Gan, Israel.

Proteins. 2002 Aug 1;48(2):377-87. doi: 10.1002/prot.10156.

An important question in protein evolution is to what extent proteins may have undergone swaps (switches of domain or fragment order) during evolution. Such events might have occurred in several forms: Swaps of short fragments, swaps of structural and functional motifs, or recombination of domains in multidomain proteins. This question is important for the theoretical understanding of the evolution of proteins, and has practical implications for using swaps as a design tool in protein engineering. In order to analyze the question systematically, we conducted a large scale survey of possible swaps and permutations among all pairs of protein from the Swissport database. A swap is defined as a specific kind of sequence mutation between two proteins in which two fragments that appear in both sequences have different relative order in the two sequences. For example, aXbYc and dYeXf are defined as a swap, where X and Y represent sequence fragments that switched their order. Identifying such swaps is difficult using standard sequence comparison packages. One of the main problems in the analysis stems from the fact that many sequences contain repeats, which may be identified as false-positive swaps. We have used two different approaches to detect pairs of proteins with swaps. The first approach is based on the predefined list of domains in Pfam. We identified all the proteins that share at least two domains and analyzed their relative order, looking for pairs in which the order of these domains was switched. We designed an algorithm to distinguish between real swaps and duplications. In the second approach, we used Blast to detect pairs of proteins that share several fragments. Then, we used an automatic procedure to select pairs that are likely to contain swaps. Those pairs were analyzed visually, using a graphical tool, to eliminate duplications. Combining these approaches, about 140 different cases of swaps in the Swissprot database were found (after eliminating multiple pairs within the same family). Some of the cases have been described in the literature, but many are novel examples. Although each new example identified may be interesting to analyze, our main conclusion is that cases of swaps are rare in protein evolution. This observation is at odds with the common view that proteins are very modular to the point that modules (e.g., domains) can be shuffled between proteins with minimal constraints. Our study suggests that sequential constraints, i.e., the relative order between domains, are highly conserved.

蛋白质进化中的一个重要问题是，在进化过程中蛋白质可能在多大程度上经历了交换（结构域或片段顺序的切换）。此类事件可能以多种形式发生：短片段的交换、结构和功能基序的交换，或多结构域蛋白中结构域的重组。这个问题对于从理论上理解蛋白质的进化很重要，并且对于在蛋白质工程中使用交换作为设计工具具有实际意义。为了系统地分析这个问题，我们对来自Swissport数据库的所有蛋白质对之间可能的交换和排列进行了大规模调查。交换被定义为两个蛋白质之间的一种特定序列突变，其中在两个序列中都出现的两个片段在这两个序列中有不同的相对顺序。例如，aXbYc和dYeXf被定义为一次交换，其中X和Y代表交换了顺序的序列片段。使用标准序列比较程序包很难识别此类交换。分析中的一个主要问题源于这样一个事实，即许多序列包含重复序列，这可能被识别为假阳性交换。我们使用了两种不同的方法来检测存在交换的蛋白质对。第一种方法基于Pfam中预定义的结构域列表。我们识别出所有共享至少两个结构域的蛋白质，并分析它们的相对顺序，寻找这些结构域顺序发生切换的蛋白质对。我们设计了一种算法来区分真正的交换和重复。在第二种方法中，我们使用Blast来检测共享多个片段的蛋白质对。然后，我们使用一个自动程序来选择可能包含交换的蛋白质对。使用图形工具对这些蛋白质对进行可视化分析，以消除重复。综合这些方法，在Swissprot数据库中发现了大约140种不同的交换情况（在消除同一家族内的多个蛋白质对之后）。其中一些情况已在文献中有所描述，但许多是新的例子。尽管每一个新发现的例子都可能值得分析，但我们的主要结论是，在蛋白质进化中交换的情况很少见。这一观察结果与普遍观点相悖，普遍观点认为蛋白质具有很强的模块化，以至于模块（例如结构域）可以在蛋白质之间以最小的限制进行重排。我们的研究表明，序列约束，即结构域之间的相对顺序，是高度保守的。

相似文献

Swaps in protein sequences.

Proteins. 2002 Aug 1;48(2):377-87. doi: 10.1002/prot.10156.

Evolution of circular permutations in multidomain proteins.

Mol Biol Evol. 2006 Apr;23(4):734-43. doi: 10.1093/molbev/msj091. Epub 2006 Jan 23.

Sequence and hydropathy profile analysis of two classes of secondary transporters.

Mol Membr Biol. 2005 May-Jun;22(3):177-89. doi: 10.1080/09687860500063324.

Automatic annotation of protein function based on family identification.

Proteins. 2003 Nov 15;53(3):683-92. doi: 10.1002/prot.10449.

The geometry of domain combination in proteins.

J Mol Biol. 2002 Jan 25;315(4):927-39. doi: 10.1006/jmbi.2001.5288.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

Comprehensive analysis of orthologous protein domains using the HOPS database.

Genome Res. 2003 Oct;13(10):2353-62. doi: 10.1101/gr1305203.

Equivalent binding sites reveal convergently evolved interaction motifs.

Bioinformatics. 2006 Mar 1;22(5):550-5. doi: 10.1093/bioinformatics/bti782. Epub 2005 Nov 15.

CombAlign: a protein sequence comparison algorithm considering recombinations.

In Silico Biol. 2004;4(3):243-54.

Identification of putative domain linkers by a neural network - application to a large sequence database.

BMC Bioinformatics. 2006 Jun 27;7:323. doi: 10.1186/1471-2105-7-323.

引用本文的文献

PhyloPro2.0: a database for the dynamic exploration of phylogenetically conserved proteins and their domain architectures across the Eukarya.

Database (Oxford). 2016 Mar 15;2016. doi: 10.1093/database/baw013. Print 2016.

New tricks for "old" domains: how novel architectures and promiscuous hubs contributed to the organization and evolution of the ECM.

Genome Biol Evol. 2014 Oct 15;6(10):2897-917. doi: 10.1093/gbe/evu228.

Predict impact of single amino acid change upon protein structure.

BMC Genomics. 2012 Jun 18;13 Suppl 4(Suppl 4):S4. doi: 10.1186/1471-2164-13-S4-S4.

Structural and functional characterization of the LldR from Corynebacterium glutamicum: a transcriptional repressor involved in L-lactate and sugar utilization.

Nucleic Acids Res. 2008 Dec;36(22):7110-23. doi: 10.1093/nar/gkn827. Epub 2008 Nov 6.

cpRAS: a novel circularly permuted RAS-like GTPase domain with a highly scattered phylogenetic distribution.

Biol Direct. 2008 May 29;3:21. doi: 10.1186/1745-6150-3-21.

A comprehensive analysis of non-sequential alignments between all protein structures.

BMC Struct Biol. 2007 Nov 16;7:78. doi: 10.1186/1472-6807-7-78.

Mapping sequences by parts.

Algorithms Mol Biol. 2007 Sep 19;2:11. doi: 10.1186/1748-7188-2-11.

Global extent of horizontal gene transfer.

Proc Natl Acad Sci U S A. 2007 Mar 13;104(11):4489-94. doi: 10.1073/pnas.0611557104. Epub 2007 Mar 7.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Swaps in protein sequences.

Proteins. 2002 Aug 1;48(2):377-87. doi: 10.1002/prot.10156.

Evolution of circular permutations in multidomain proteins.

Mol Biol Evol. 2006 Apr;23(4):734-43. doi: 10.1093/molbev/msj091. Epub 2006 Jan 23.

Sequence and hydropathy profile analysis of two classes of secondary transporters.

Mol Membr Biol. 2005 May-Jun;22(3):177-89. doi: 10.1080/09687860500063324.

Automatic annotation of protein function based on family identification.

Proteins. 2003 Nov 15;53(3):683-92. doi: 10.1002/prot.10449.

The geometry of domain combination in proteins.

J Mol Biol. 2002 Jan 25;315(4):927-39. doi: 10.1006/jmbi.2001.5288.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

Comprehensive analysis of orthologous protein domains using the HOPS database.

Genome Res. 2003 Oct;13(10):2353-62. doi: 10.1101/gr1305203.

Equivalent binding sites reveal convergently evolved interaction motifs.

Bioinformatics. 2006 Mar 1;22(5):550-5. doi: 10.1093/bioinformatics/bti782. Epub 2005 Nov 15.

CombAlign: a protein sequence comparison algorithm considering recombinations.

In Silico Biol. 2004;4(3):243-54.

Identification of putative domain linkers by a neural network - application to a large sequence database.

BMC Bioinformatics. 2006 Jun 27;7:323. doi: 10.1186/1471-2105-7-323.

引用本文的文献

PhyloPro2.0: a database for the dynamic exploration of phylogenetically conserved proteins and their domain architectures across the Eukarya.

Database (Oxford). 2016 Mar 15;2016. doi: 10.1093/database/baw013. Print 2016.

New tricks for "old" domains: how novel architectures and promiscuous hubs contributed to the organization and evolution of the ECM.

Genome Biol Evol. 2014 Oct 15;6(10):2897-917. doi: 10.1093/gbe/evu228.

Predict impact of single amino acid change upon protein structure.

BMC Genomics. 2012 Jun 18;13 Suppl 4(Suppl 4):S4. doi: 10.1186/1471-2164-13-S4-S4.

Structural and functional characterization of the LldR from Corynebacterium glutamicum: a transcriptional repressor involved in L-lactate and sugar utilization.

Nucleic Acids Res. 2008 Dec;36(22):7110-23. doi: 10.1093/nar/gkn827. Epub 2008 Nov 6.

cpRAS: a novel circularly permuted RAS-like GTPase domain with a highly scattered phylogenetic distribution.

Biol Direct. 2008 May 29;3:21. doi: 10.1186/1745-6150-3-21.

A comprehensive analysis of non-sequential alignments between all protein structures.

BMC Struct Biol. 2007 Nov 16;7:78. doi: 10.1186/1472-6807-7-78.

Mapping sequences by parts.

Algorithms Mol Biol. 2007 Sep 19;2:11. doi: 10.1186/1748-7188-2-11.

Global extent of horizontal gene transfer.

Proc Natl Acad Sci U S A. 2007 Mar 13;104(11):4489-94. doi: 10.1073/pnas.0611557104. Epub 2007 Mar 7.

Swaps in protein sequences.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献