Institute of Biomedical Engineering, National Taiwan University, Taipei, Taiwan.
J Chem Inf Model. 2011 Oct 24;51(10):2528-37. doi: 10.1021/ci200220v. Epub 2011 Oct 7.
Ordinary least-squares (OLS) regression has been used widely for constructing the scoring functions for protein-ligand interactions. However, OLS is very sensitive to the existence of outliers, and models constructed using it are easily affected by the outliers or even the choice of the data set. On the other hand, determination of atomic charges is regarded as of central importance, because the electrostatic interaction is known to be a key contributing factor for biomolecular association. In the development of the AutoDock4 scoring function, only OLS was conducted, and the simple Gasteiger method was adopted. It is therefore of considerable interest to see whether more rigorous charge models could improve the statistical performance of the AutoDock4 scoring function. In this study, we have employed two well-established quantum chemical approaches, namely the restrained electrostatic potential (RESP) and the Austin-model 1-bond charge correction (AM1-BCC) methods, to obtain atomic partial charges, and we have compared how different charge models affect the performance of AutoDock4 scoring functions. In combination with robust regression analysis and outlier exclusion, our new protein-ligand free energy regression model with AM1-BCC charges for ligands and Amber99SB charges for proteins achieve lowest root-mean-squared error of 1.637 kcal/mol for the training set of 147 complexes and 2.176 kcal/mol for the external test set of 1427 complexes. The assessment for binding pose prediction with the 100 external decoy sets indicates very high success rate of 87% with the criteria of predicted root-mean-squared deviation of less than 2 Å. The success rates and statistical performance of our robust scoring functions are only weakly class-dependent (hydrophobic, hydrophilic, or mixed).
最小二乘法(OLS)回归已广泛用于构建蛋白质-配体相互作用的评分函数。然而,OLS 对离群值的存在非常敏感,并且使用它构建的模型很容易受到离群值甚至数据集选择的影响。另一方面,确定原子电荷被认为至关重要,因为静电相互作用被认为是生物分子结合的关键贡献因素。在 AutoDock4 评分函数的开发中,仅进行了 OLS,并且采用了简单的 Gasteiger 方法。因此,观察更严格的电荷模型是否可以提高 AutoDock4 评分函数的统计性能是非常有趣的。在这项研究中,我们采用了两种成熟的量子化学方法,即受限静电势(RESP)和 Austin 模型 1 键电荷修正(AM1-BCC)方法,以获得原子部分电荷,并比较了不同电荷模型如何影响 AutoDock4 评分函数的性能。结合稳健回归分析和异常值排除,我们的新蛋白质-配体自由能回归模型使用 AM1-BCC 电荷对配体和 Amber99SB 电荷对蛋白质,对 147 个复合物的训练集的均方根误差为 1.637 kcal/mol,对 1427 个复合物的外部测试集的均方根误差为 2.176 kcal/mol。使用 100 个外部诱饵集进行结合构象预测的评估表明,预测均方根偏差小于 2 Å 的标准下,成功率非常高,为 87%。我们的稳健评分函数的成功率和统计性能仅与类(疏水性、亲水性或混合性)弱相关。