T.C. Jenkins Department of Biophysics, Johns Hopkins University, Baltimore, Maryland 21218, USA.
Protein Sci. 2010 Jun;19(6):1127-36. doi: 10.1002/pro.399.
We present a method with the potential to generate a library of coil segments from first principles. Proteins are built from alpha-helices and/or beta-strands interconnected by these coil segments. Here, we investigate the conformational determinants of short coil segments, with particular emphasis on chain turns. Toward this goal, we extracted a comprehensive set of two-, three-, and four-residue turns from X-ray-elucidated proteins and classified them by conformation. A remarkably small number of unique conformers account for most of this experimentally determined set, whereas remaining members span a large number of rare conformers, many occurring only once in the entire protein database. Factors determining conformation were identified via Metropolis Monte Carlo simulations devised to test the effectiveness of various energy terms. Simulated structures were validated by comparison to experimental counterparts. After filtering rare conformers, we found that 98% of the remaining experimentally determined turn population could be reproduced by applying a hydrogen bond energy term to an exhaustively generated ensemble of clash-free conformers in which no backbone polar group lacks a hydrogen-bond partner. Further, at least 90% of longer coil segments, ranging from 5- to 20 residues, were found to be structural composites of these shorter primitives. These results are pertinent to protein structure prediction, where approaches can be divided into either empirical or ab initio methods. Empirical methods use database-derived information; ab initio methods rely on physical-chemical principles exclusively. Replacing the database-derived coil library with one generated from first principles would transform any empirically based method into its corresponding ab initio homologue.
我们提出了一种从第一原理生成线圈片段库的方法。蛋白质由α-螺旋和/或β-折叠通过这些线圈片段相互连接而成。在这里,我们研究了短线圈片段的构象决定因素,特别强调了链环。为此,我们从 X 射线解析的蛋白质中提取了一套全面的两、三、四残基环,并按构象对其进行分类。一个非常小的独特构象数占了这一组实验确定的大部分,而其余的成员则跨越了大量罕见的构象,其中许多只在整个蛋白质数据库中出现一次。通过设计基于 metropolis 蒙特卡罗模拟的方法来测试各种能量项的有效性,确定了决定构象的因素。通过与实验对应物进行比较来验证模拟结构。在过滤掉罕见的构象后,我们发现,通过将氢键能量项应用于一个完全生成的无冲突构象集合中,可以重现实验确定的环群体中 98%的剩余部分,在该集合中,没有一个骨架极性基团没有氢键伙伴。此外,至少 90%的更长的线圈片段,从 5 到 20 个残基,被发现是这些较短的原始片段的结构复合材料。这些结果与蛋白质结构预测有关,在蛋白质结构预测中,可以将方法分为经验或从头开始的方法。经验方法使用数据库衍生的信息;从头开始的方法仅依赖于物理化学原理。用从第一原理生成的线圈库取代数据库衍生的线圈库,将任何基于经验的方法转换为其相应的从头开始同源物。