基于全原子自由能模型的无模板蛋白质结构预测与质量评估

Template-free protein structure prediction and quality assessment with an all-atom free-energy model.

作者信息

Gopal Srinivasa Murthy, Klenin Konstantin, Wenzel Wolfgang

机构信息

Forschungszentrum Karlsruhe, Institute for Nanotechnology, PO Box 3640, 76021 Karlsruhe, Germany.

出版信息

Proteins. 2009 Nov 1;77(2):330-41. doi: 10.1002/prot.22438.

DOI:10.1002/prot.22438

PMID:19422063

Abstract

Biophysical forcefields have contributed less than originally anticipated to recent progress in protein structure prediction. Here, we have investigated the selectivity of a recently developed all-atom free-energy forcefield for protein structure prediction and quality assessment (QA). Using a heuristic method, but excluding homology, we generated decoy-sets for all targets of the CASP7 protein structure prediction assessment with <150 amino acids. The decoys in each set were then ranked by energy in short relaxation simulations and the best low-energy cluster was submitted as a prediction. For four of nine template-free targets, this approach generated high-ranking predictions within the top 10 models submitted in CASP7 for the respective targets. For these targets, our de-novo predictions had an average GDT_S score of 42.81, significantly above the average of all groups. The refinement protocol has difficulty for oligomeric targets and when no near-native decoys are generated in the decoy library. For targets with high-quality decoy sets the refinement approach was highly selective. Motivated by this observation, we rescored all server submissions up to 200 amino acids using a similar refinement protocol, but using no clustering, in a QA exercise. We found an excellent correlation between the best server models and those with the lowest energy in the forcefield. The free-energy refinement protocol may thus be an efficient tool for relative QA and protein structure prediction.

摘要

生物物理力场对蛋白质结构预测近期进展的贡献比最初预期的要小。在此，我们研究了一种最近开发的用于蛋白质结构预测和质量评估（QA）的全原子自由能力场的选择性。我们使用一种启发式方法，但不考虑同源性，为CASP7蛋白质结构预测评估中所有氨基酸少于150个的目标生成了诱饵集。然后，通过短弛豫模拟中的能量对每个集合中的诱饵进行排序，并将最佳低能量簇作为预测结果提交。对于九个无模板目标中的四个，这种方法在CASP7中针对各个目标提交的前10个模型中生成了排名靠前的预测。对于这些目标，我们的从头预测的平均GDT_S分数为42.81，显著高于所有组的平均值。对于寡聚体目标以及诱饵库中未生成接近天然的诱饵时，优化方案存在困难。对于具有高质量诱饵集的目标，优化方法具有高度选择性。受此观察结果的启发，我们在一次QA练习中使用类似的优化方案，但不进行聚类，对所有长度达200个氨基酸的服务器提交结果重新评分。我们发现最佳服务器模型与力场中能量最低的模型之间存在极好的相关性。因此，自由能优化方案可能是一种用于相对QA和蛋白质结构预测的有效工具。