重复蛋白序列空间的大小和结构。
Size and structure of the sequence space of repeat proteins.
机构信息
Laboratoire de physique de l'École normale supérieure (PSL University), CNRS, Sorbonne Université, and Université de Paris, 75005 Paris, France.
Protein Physiology Lab, Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Departamento de Química Biológica, Buenos Aires, Argentina.
出版信息
PLoS Comput Biol. 2019 Aug 15;15(8):e1007282. doi: 10.1371/journal.pcbi.1007282. eCollection 2019 Aug.
The coding space of protein sequences is shaped by evolutionary constraints set by requirements of function and stability. We show that the coding space of a given protein family-the total number of sequences in that family-can be estimated using models of maximum entropy trained on multiple sequence alignments of naturally occuring amino acid sequences. We analyzed and calculated the size of three abundant repeat proteins families, whose members are large proteins made of many repetitions of conserved portions of ∼30 amino acids. While amino acid conservation at each position of the alignment explains most of the reduction of diversity relative to completely random sequences, we found that correlations between amino acid usage at different positions significantly impact that diversity. We quantified the impact of different types of correlations, functional and evolutionary, on sequence diversity. Analysis of the detailed structure of the coding space of the families revealed a rugged landscape, with many local energy minima of varying sizes with a hierarchical structure, reminiscent of fustrated energy landscapes of spin glass in physics. This clustered structure indicates a multiplicity of subtypes within each family, and suggests new strategies for protein design.
蛋白质序列的编码空间受功能和稳定性要求所设定的进化限制的影响。我们表明,可以使用基于自然发生的氨基酸序列多重序列比对训练的最大熵模型来估计给定蛋白质家族的编码空间 - 该家族中的序列总数。我们分析和计算了三种丰富的重复蛋白家族的大小,其成员是由许多约 30 个氨基酸的保守部分重复组成的大型蛋白。尽管对齐中每个位置的氨基酸保守性解释了与完全随机序列相比多样性降低的大部分原因,但我们发现不同位置的氨基酸使用之间的相关性会显著影响多样性。我们量化了不同类型的相关性(功能相关性和进化相关性)对序列多样性的影响。对家族编码空间的详细结构的分析揭示了一个崎岖的景观,具有不同大小的多个局部能量最小值,具有层次结构,类似于物理学中自旋玻璃的受挫能量景观。这种聚类结构表明每个家族内存在多种亚型,并为蛋白质设计提出了新的策略。
相似文献
PLoS Comput Biol. 2019-8-15
Bioinformatics. 2001-8
Proc Natl Acad Sci U S A. 2002-2-5
J Mol Biol. 2000-4-28
PLoS Comput Biol. 2019-4-8
Bioinformatics. 2004-8-4
引用本文的文献
BMC Bioinformatics. 2025-1-7
Front Bioinform. 2021-7-6
PeerJ Comput Sci. 2021-9-17
PLoS Comput Biol. 2020-10
本文引用的文献
PLoS Comput Biol. 2019-4-8
Elife. 2019-3-12
Angew Chem Int Ed Engl. 2018-3-25
Curr Opin Struct Biol. 2017-11-5
Biophys J. 2017-10-17
PLoS Comput Biol. 2017-6-15
Nucleic Acids Res. 2017-1-4