Faver John C, Benson Mark L, He Xiao, Roberts Benjamin P, Wang Bing, Marshall Michael S, Kennedy Matthew R, Sherrill C David, Merz Kenneth M
Quantum Theory Project, The University of Florida. 2328 New Physics Building P.O. Box 118435. Gainesville, FL 32611-8435.
J Chem Theory Comput. 2011 Mar 8;7(3):790-797. doi: 10.1021/ct100563b.
A largely unsolved problem in computational biochemistry is the accurate prediction of binding affinities of small ligands to protein receptors. We present a detailed analysis of the systematic and random errors present in computational methods through the use of error probability density functions, specifically for computed interaction energies between chemical fragments comprising a protein-ligand complex. An HIV-II protease crystal structure with a bound ligand (indinavir) was chosen as a model protein-ligand complex. The complex was decomposed into twenty-one (21) interacting fragment pairs, which were studied using a number of computational methods. The chemically accurate complete basis set coupled cluster theory (CCSD(T)/CBS) interaction energies were used as reference values to generate our error estimates. In our analysis we observed significant systematic and random errors in most methods, which was surprising especially for parameterized classical and semiempirical quantum mechanical calculations. After propagating these fragment-based error estimates over the entire protein-ligand complex, our total error estimates for many methods are large compared to the experimentally determined free energy of binding. Thus, we conclude that statistical error analysis is a necessary addition to any scoring function attempting to produce reliable binding affinity predictions.
计算生物化学中一个很大程度上尚未解决的问题是准确预测小配体与蛋白质受体的结合亲和力。我们通过使用误差概率密度函数,对计算方法中存在的系统误差和随机误差进行了详细分析,特别是针对构成蛋白质-配体复合物的化学片段之间计算出的相互作用能。选择一个结合了配体(茚地那韦)的HIV-II蛋白酶晶体结构作为模型蛋白质-配体复合物。该复合物被分解为21对相互作用的片段对,并使用多种计算方法对其进行研究。使用化学精度的完全基组耦合簇理论(CCSD(T)/CBS)相互作用能作为参考值来生成我们的误差估计。在我们的分析中,我们在大多数方法中都观察到了显著的系统误差和随机误差,这尤其令人惊讶,特别是对于参数化的经典和半经验量子力学计算。在将这些基于片段的误差估计扩展到整个蛋白质-配体复合物后,与实验测定的结合自由能相比,我们对许多方法的总误差估计很大。因此,我们得出结论,统计误差分析是任何试图产生可靠结合亲和力预测的评分函数中必不可少的补充。