Center for Physics of Evolving Systems, Department of Biochemistry & Molecular Biology, University of Chicago, Chicago, IL 60637, USA.
Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.
Cell Syst. 2023 Mar 15;14(3):210-219.e7. doi: 10.1016/j.cels.2022.12.013. Epub 2023 Jan 23.
Protein structure, function, and evolution depend on local and collective epistatic interactions between amino acids. A powerful approach to defining these interactions is to construct models of couplings between amino acids that reproduce the empirical statistics (frequencies and correlations) observed in sequences comprising a protein family. The top couplings are then interpreted. Here, we show that as currently implemented, this inference unequally represents epistatic interactions, a problem that fundamentally arises from limited sampling of sequences in the context of distinct scales at which epistasis occurs in proteins. We show that these issues explain the ability of current approaches to predict tertiary contacts between amino acids and the inability to obviously expose larger networks of functionally relevant, collectively evolving residues called sectors. This work provides a necessary foundation for more deeply understanding and improving evolution-based models of proteins.
蛋白质的结构、功能和进化取决于氨基酸之间的局部和整体的上位效应相互作用。一种定义这些相互作用的有力方法是构建氨基酸之间的耦合模型,这些模型再现了由蛋白质家族组成的序列中观察到的经验统计数据(频率和相关性)。然后对顶级耦合进行解释。在这里,我们表明,按照当前的实现方式,这种推断不平等地代表了上位效应相互作用,这是一个根本问题,源于在蛋白质中发生上位效应的不同尺度下,序列的有限采样。我们表明,这些问题解释了当前方法预测氨基酸之间的三级接触的能力,以及无法明显揭示称为扇区的功能相关、共同进化的残基的更大网络的能力。这项工作为更深入地理解和改进基于进化的蛋白质模型提供了必要的基础。