Meyerguz Leonid, Grasso Catherine, Kleinberg Jon, Elber Ron
Department of Computer Science, 4130 Upson Hall, Cornell University, Ithaca, NY 14853, USA.
Structure. 2004 Apr;12(4):547-57. doi: 10.1016/j.str.2004.02.018.
Mechanisms leading to gene variations are responsible for the diversity of species and are important components of the theory of evolution. One constraint on gene evolution is that of protein foldability; the three-dimensional shapes of proteins must be thermodynamically stable. We explore the impact of this constraint and calculate properties of foldable sequences using 3660 structures from the Protein Data Bank. We seek a selection function that receives sequences as input, and outputs survival probability based on sequence fitness to structure. We compute the number of sequences that match a particular protein structure with energy lower than the native sequence, the density of the number of sequences, the entropy, and the "selection" temperature. The mechanism of structure selection for sequences longer than 200 amino acids is approximately universal. For shorter sequences, it is not. We speculate on concrete evolutionary mechanisms that show this behavior.
导致基因变异的机制是物种多样性的原因,也是进化理论的重要组成部分。基因进化的一个限制因素是蛋白质的可折叠性;蛋白质的三维形状必须在热力学上稳定。我们探讨了这一限制因素的影响,并使用蛋白质数据库中的3660个结构计算了可折叠序列的特性。我们寻求一种选择函数,该函数将序列作为输入,并根据序列与结构的适应性输出存活概率。我们计算了与特定蛋白质结构匹配且能量低于天然序列的序列数量、序列数量密度、熵以及“选择”温度。长度超过200个氨基酸的序列的结构选择机制大致是通用的。对于较短的序列则不然。我们推测了表现出这种行为的具体进化机制。