Espada Rocío, Parra R Gonzalo, Mora Thierry, Walczak Aleksandra M, Ferreiro Diego U
Protein Physiology Lab, Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Departamento de Química Biológica. Buenos Aires, Argentina. / CONICET - Universidad de Buenos Aires. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN). Buenos Aires, Argentina.
Quantitative and Computational Biology Group, Max Planck Institute for Biophysical Chemistry, Goettingen, Germany.
PLoS Comput Biol. 2017 Jun 15;13(6):e1005584. doi: 10.1371/journal.pcbi.1005584. eCollection 2017 Jun.
Natural protein sequences contain a record of their history. A common constraint in a given protein family is the ability to fold to specific structures, and it has been shown possible to infer the main native ensemble by analyzing covariations in extant sequences. Still, many natural proteins that fold into the same structural topology show different stabilization energies, and these are often related to their physiological behavior. We propose a description for the energetic variation given by sequence modifications in repeat proteins, systems for which the overall problem is simplified by their inherent symmetry. We explicitly account for single amino acid and pair-wise interactions and treat higher order correlations with a single term. We show that the resulting evolutionary field can be interpreted with structural detail. We trace the variations in the energetic scores of natural proteins and relate them to their experimental characterization. The resulting energetic evolutionary field allows the prediction of the folding free energy change for several mutants, and can be used to generate synthetic sequences that are statistically indistinguishable from the natural counterparts.
天然蛋白质序列记录着它们的历史。给定蛋白质家族中的一个常见限制是折叠成特定结构的能力,并且已经表明通过分析现存序列中的共变可以推断出主要的天然整体结构。然而,许多折叠成相同结构拓扑的天然蛋白质表现出不同的稳定能,并且这些稳定能通常与它们的生理行为相关。我们针对重复蛋白质中序列修饰所给出的能量变化提出了一种描述,对于这类系统,其整体问题由于固有的对称性而得到简化。我们明确考虑了单个氨基酸和成对相互作用,并使用一个单一的项来处理高阶相关性。我们表明,由此产生的进化场可以从结构细节上进行解释。我们追踪天然蛋白质能量得分的变化,并将它们与其实验特征联系起来。由此产生的能量进化场能够预测几种突变体的折叠自由能变化,并且可用于生成在统计学上与天然对应物无法区分的合成序列。