Laboratório de Biofísica Teórica, Departamento de Biologia Celular, Universidade de Brasília, Brasília, Brazil.
Proteins. 2024 May;92(5):679-687. doi: 10.1002/prot.26658. Epub 2023 Dec 29.
Random energy models (REMs) provide a simple description of the energy landscapes that guide protein folding and evolution. The requirement of a large energy gap between the native structure and unfolded conformations, considered necessary for cooperative, protein-like, folding behavior, indicates that proteins differ markedly from random heteropolymers. It has been suggested, therefore, that natural selection might have acted to choose nonrandom amino acid sequences satisfying this particular condition, implying that a large fraction of possible, unselected random sequences, would not fold to any structure. From an informational perspective, however, this scenario could indicate that protein structures, regarded as messages to be transmitted through a communication channel, would not be efficiently encoded in amino acid sequences, regarded as the communication channel for this transmission, since a large fraction of possible channel states would not be used. Here, we use a combined REM for conformations and sequences, with previously estimated parameters for natural proteins, to explore an alternative possibility in which the appropriate shape of the landscape results mainly from the deviation from randomness of possible native structures instead of sequences. We observe that this situation emerges naturally if the distribution of conformational energies happens to arise from two independent contributions corresponding to sequence-dependent and -independent terms. This construction is consistent with the hypothesis of a protein burial folding code, with native structures being determined by a modest amount of sequence-dependent atomic burial information with sequence-independent constraints imposed by unspecific hydrogen bond formation. More generally, an appropriate combination of sequence-dependent and -independent information accommodates the possibility of an efficient structural encoding with the main physical requirement for folding, providing possible insight not only on the folding process but also on several aspects sequence evolution such as neutral networks, conformational coverage, and de novo gene emergence.
随机能量模型(REMs)为指导蛋白质折叠和进化的能量景观提供了简单的描述。天然结构和未折叠构象之间需要存在较大的能量间隙,这被认为是协同的、类似蛋白质的折叠行为所必需的,这表明蛋白质与随机杂聚物明显不同。因此,有人认为,自然选择可能选择了满足这一特定条件的非随机氨基酸序列,这意味着大量可能的、未经选择的随机序列不会折叠成任何结构。然而,从信息的角度来看,这种情况可能表明,蛋白质结构被视为要通过通信通道传输的信息,不会有效地编码在氨基酸序列中,因为氨基酸序列被视为这种传输的通信通道,因为大量可能的通道状态不会被使用。在这里,我们使用构象和序列的组合 REM,使用先前估计的天然蛋白质参数,来探索另一种可能性,即适当的景观形状主要来自于可能的天然结构的偏离随机性,而不是序列。我们观察到,如果构象能的分布恰好来自于对应于序列相关和非序列相关项的两个独立贡献,那么这种情况就会自然出现。这种构建与蛋白质埋藏折叠码的假设是一致的,天然结构由相当数量的序列相关原子埋藏信息决定,而序列独立的约束则由非特异性氢键形成施加。更一般地,序列相关和非序列相关信息的适当组合可以适应有效的结构编码的可能性,满足折叠的主要物理要求,不仅为折叠过程提供了可能的见解,还为序列进化的几个方面提供了可能的见解,如中性网络、构象覆盖和从头基因出现。