Wrabl James O, Larson Scott A, Hilser Vincent J
Department of Human Biological Chemistry and Genetics, University of Texas Medical Branch, Galveston 77555-1055, USA.
Protein Sci. 2002 Aug;11(8):1945-57. doi: 10.1110/ps.0203202.
To investigate the relationship between an amino acid sequence and its corresponding protein fold, a database of thermodynamic stability information was assembled as a function of residue type from 81 nonhomologous proteins. This information was obtained using the COREX algorithm, which computes an ensemble-based description of the native state of proteins. Dissection of the COREX stability constant into its fundamental energetic components resulted in 12 thermodynamic environments describing the tertiary architecture of protein folds. Because of the observation that residue types partitioned unequally between these environments, it was hypothesized that thermodynamic environments contained energetic information that connected sequence to fold. To test the significance of this hypothesis, the thermodynamic stability information was incorporated into a three-dimensional-to-one-dimensional scoring matrix, and simple fold recognition experiments were performed in a manner such that information about the fold target was never included in the scoring. For 60 out of 81 fold targets, the correct sequence for the target scored in the top 5% of 3858 decoy sequences, with Z-scores ranging from 1.76 to 12.23. Furthermore, a scoring matrix assembled from the residues of 40 nonhomologous all-alpha proteins was used to thread sequences against 12 nonhomologous all-beta protein targets. In 10 of 12 cases, sequences known to adopt the native all-beta structure scored in the top 5% of 3858 decoy sequences, with Z-scores ranging from 1.99 to 7.94. These results indicate that energetic information encoded by thermodynamic environments represents a fundamental property of proteins that underlies classifications based on secondary structure.
为了研究氨基酸序列与其相应蛋白质折叠之间的关系,构建了一个热力学稳定性信息数据库,该数据库是81种非同源蛋白质的残基类型的函数。此信息是使用COREX算法获得的,该算法计算基于整体的蛋白质天然状态描述。将COREX稳定性常数分解为其基本能量成分,得到了12种描述蛋白质折叠三级结构的热力学环境。由于观察到残基类型在这些环境之间分配不均,因此推测热力学环境包含将序列与折叠联系起来的能量信息。为了检验这一假设的重要性,将热力学稳定性信息纳入三维到一维评分矩阵,并以从不将折叠目标信息纳入评分的方式进行简单的折叠识别实验。对于81个折叠目标中的60个,目标的正确序列在3858个诱饵序列的前5%中得分,Z分数范围为1.76至12.23。此外,使用由40种非同源全α蛋白的残基组装而成的评分矩阵,将序列与12种非同源全β蛋白目标进行比对。在12个案例中的10个中,已知采用天然全β结构的序列在3858个诱饵序列的前5%中得分,Z分数范围为1.99至7.94。这些结果表明,热力学环境编码的能量信息代表了蛋白质的一种基本属性,它是基于二级结构分类的基础。