Simons K T, Ruczinski I, Kooperberg C, Fox B A, Bystroff C, Baker D
Department of Biochemistry, University of Washington, Seattle 98195, USA.
Proteins. 1999 Jan 1;34(1):82-95. doi: 10.1002/(sici)1097-0134(19990101)34:1<82::aid-prot7>3.0.co;2-a.
We describe the development of a scoring function based on the decomposition P(structure/sequence) proportional to P(sequence/structure) *P(structure), which outperforms previous scoring functions in correctly identifying native-like protein structures in large ensembles of compact decoys. The first term captures sequence-dependent features of protein structures, such as the burial of hydrophobic residues in the core, the second term, universal sequence-independent features, such as the assembly of beta-strands into beta-sheets. The efficacies of a wide variety of sequence-dependent and sequence-independent features of protein structures for recognizing native-like structures were systematically evaluated using ensembles of approximately 30,000 compact conformations with fixed secondary structure for each of 17 small protein domains. The best results were obtained using a core scoring function with P(sequence/structure) parameterized similarly to our previous work (Simons et al., J Mol Biol 1997;268:209-225] and P(structure) focused on secondary structure packing preferences; while several additional features had some discriminatory power on their own, they did not provide any additional discriminatory power when combined with the core scoring function. Our results, on both the training set and the independent decoy set of Park and Levitt (J Mol Biol 1996;258:367-392), suggest that this scoring function should contribute to the prediction of tertiary structure from knowledge of sequence and secondary structure.
我们描述了一种基于分解式P(结构/序列)与P(序列/结构) * P(结构)成比例的评分函数的开发,该评分函数在从大量紧密诱饵集合中正确识别类似天然的蛋白质结构方面优于先前的评分函数。第一项捕捉蛋白质结构的序列依赖性特征,如疏水残基在核心中的埋藏,第二项捕捉通用的序列无关特征,如β链组装成β折叠。使用17个小蛋白质结构域中每个结构域具有固定二级结构的约30,000个紧密构象集合,系统地评估了蛋白质结构的各种序列依赖性和序列无关特征对识别类似天然结构的有效性。使用核心评分函数获得了最佳结果,其中P(序列/结构)的参数化与我们之前的工作类似(Simons等人,《分子生物学杂志》1997年;268:209 - 225),P(结构)侧重于二级结构堆积偏好;虽然几个额外的特征本身具有一定的区分能力,但与核心评分函数结合时并没有提供任何额外的区分能力。我们在训练集以及Park和Levitt的独立诱饵集(《分子生物学杂志》1996年;258:367 - 392)上的结果表明,这种评分函数应该有助于从序列和二级结构知识预测三级结构。