He David, Chung Martin, Chan Esther, Alleyne Trevis, Ha Kevin C H, Miao Ming, Stahl Richard J, Keeley Fred W, Parkinson John
Program in Molecular Structure and Function, Hospital for Sick Children, University of Toronto, 555 University Avenue, Toronto, Ontario Canada M5G 1X8.
Matrix Biol. 2007 Sep;26(7):524-40. doi: 10.1016/j.matbio.2007.05.005. Epub 2007 May 25.
Due to the low complexity associated with their sequences, uncovering the evolutionary and functional relationships in highly repetitive proteins such as elastin, spider silks, resilin and abductin represents a significant challenge. Using the polymeric extracellular protein elastin as a model system, we present a novel computational approach to the study of sequence, function and evolutionary relationships in repetitive proteins. To address the absence of accurate sequence annotation for repetitive proteins such as elastin, we have constructed a new database repository, ElastoDB (http://theileria.ccb.sickkids.ca/elastin), dedicated to the storage and retrieval of elastin sequence- and meta-data. To analyse their sequence relationships we have devised an innovative new method, based on the identification of overrepresented 'fuzzy' motifs. Applying this method to elastin sequences derived from mammals, chicken, Xenopus and zebrafish resulted in the identification of both highly conserved, and taxon and species specific motifs that likely represent important functional and/or structural elements. The relative spacing and organization of these elements suggest that exon duplication events have played an important role in the evolution of elastin. Clustering of similarity profiles generated for sets of exons and introns, revealed a pattern of putative duplication events involving exons 15-30 in mammalian and chicken elastins, exons 20-31 in both zebrafish elastins, exons 15-20 in fugu elastin and exons 35-50 in Xenopus elastin 1. The success of this approach for elastin offers a promising route to the elucidation of sequence, structure, function and evolutionary relationships for many other proteins with sequences of low complexity.
由于与它们的序列相关的低复杂性,揭示诸如弹性蛋白、蜘蛛丝、节肢弹性蛋白和明胶等高度重复蛋白质中的进化和功能关系是一项重大挑战。以聚合细胞外蛋白弹性蛋白作为模型系统,我们提出了一种新的计算方法来研究重复蛋白质中的序列、功能和进化关系。为了解决诸如弹性蛋白等重复蛋白质缺乏准确序列注释的问题,我们构建了一个新的数据库存储库ElastoDB(http://theileria.ccb.sickkids.ca/elastin),专门用于存储和检索弹性蛋白序列及元数据。为了分析它们的序列关系,我们设计了一种基于识别过度代表的“模糊”基序的创新方法。将此方法应用于源自哺乳动物、鸡、非洲爪蟾和斑马鱼的弹性蛋白序列,结果鉴定出了高度保守的以及分类群和物种特异性的基序,这些基序可能代表重要的功能和/或结构元件。这些元件的相对间距和组织表明外显子重复事件在弹性蛋白的进化中发挥了重要作用。对为外显子和内含子集生成的相似性图谱进行聚类,揭示了一种推定的重复事件模式,涉及哺乳动物和鸡弹性蛋白中的外显子15 - 30、两种斑马鱼弹性蛋白中的外显子20 - 31、河豚弹性蛋白中的外显子15 - 20以及非洲爪蟾弹性蛋白1中的外显子35 - 50。这种针对弹性蛋白的方法的成功为阐明许多其他具有低复杂性序列的蛋白质的序列、结构、功能和进化关系提供了一条有前景的途径。