Zhang Chi, Liu Song, Zhou Hongyi, Zhou Yaoqi
Howard Hughes Medical Institute Center for Single Molecule Biophysics, SUNY Buffalo, 124 Sherman Hall, Buffalo, NY 14214, USA.
Protein Sci. 2004 Feb;13(2):400-11. doi: 10.1110/ps.03348304.
Structure prediction on a genomic scale requires a simplified energy function that can efficiently sample the conformational space of polypeptide chains. A good energy function at minimum should discriminate native structures against decoys. Here, we show that a recently developed, residue-specific, all-atom knowledge-based potential (167 atomic types) based on distance-scaled, finite ideal-gas reference state (DFIRE-all-atom) can be substantially simplified to 20 residue types located at side-chain center of mass (DFIRE-SCM) without a significant change in its capability of structure discrimination. Using 96 standard multiple decoy sets, we show that there is only a small reduction (from 80% to 78%) in success rate of ranking native structures as the top 1. The success rate is higher than two previously developed, all-atom distance-dependent statistical pair potentials. Applied to structure selections of 21 docking decoys without modification, the DFIRE-SCM potential is 29% more successful in recognizing native complex structures than an all-atom statistical potential trained by a database of dimeric interfaces. The potential also achieves 92% accuracy in distinguishing true dimeric interfaces from artificial crystal interfaces. In addition, the DFIRE potential with the C(alpha) positions as the interaction centers recognizes 123 native structures out of a comprehensive 125-protein TOUCHSTONE decoy set in which each protein has 24,000 decoys with only C(alpha) positions. Furthermore, the performance by DFIRE-SCM on newly established 25 monomeric and 31 docking Rosetta-decoy sets is comparable to (or better than in the case of monomeric decoy sets) that of a recently developed, all-atom Rosetta energy function enhanced with an orientation-dependent hydrogen bonding potential.
基因组规模的结构预测需要一个简化的能量函数,该函数能够有效地对多肽链的构象空间进行采样。一个好的能量函数至少应能区分天然结构和诱饵结构。在此,我们表明,最近开发的基于距离缩放的有限理想气体参考状态的残基特异性全原子知识势能(167种原子类型)(DFIRE全原子)可以大幅简化为位于侧链质心的20种残基类型(DFIRE-SCM),而其结构区分能力不会发生显著变化。使用96个标准的多诱饵集,我们表明,将天然结构排在首位的成功率仅略有下降(从80%降至78%)。该成功率高于之前开发的两种全原子距离依赖统计对势能。在未经修改的情况下应用于21个对接诱饵的结构选择时,DFIRE-SCM势能在识别天然复合物结构方面比由二聚体界面数据库训练的全原子统计势能成功29%。该势能在区分真实二聚体界面和人工晶体界面方面也达到了92%的准确率。此外,以Cα位置作为相互作用中心的DFIRE势能在一个包含125种蛋白质的综合TOUCHSTONE诱饵集中识别出123个天然结构,其中每个蛋白质有24000个仅含Cα位置的诱饵。此外,DFIRE-SCM在新建立的25个单体和31个对接Rosetta诱饵集上的性能与最近开发的、通过取向依赖氢键势能增强的全原子Rosetta能量函数相当(在单体诱饵集的情况下优于该能量函数)。