The Quantum Theory Project, The University of Florida, 2328 New Physics Building, P.O. Box 118435, Gainesville, FL 32611-8435, USA.
J Comput Aided Mol Des. 2012 May;26(5):647-59. doi: 10.1007/s10822-012-9567-9. Epub 2012 Apr 4.
Two families of binding affinity estimation methodologies are described which were utilized in the SAMPL3 trypsin/fragment binding affinity challenge. The first is a free energy decomposition scheme based on a thermodynamic cycle, which included separate contributions from enthalpy and entropy of binding as well as a solvent contribution. Enthalpic contributions were estimated with PM6-DH2 semiempirical quantum mechanical interaction energies, which were modified with a statistical error correction procedure. Entropic contributions were estimated with the rigid-rotor harmonic approximation, and solvent contributions to the free energy were estimated with several different methods. The second general methodology is the empirical score LISA, which contains several physics-based terms trained with the large PDBBind database of protein/ligand complexes. Here we also introduce LISA+, an updated version of LISA which, prior to scoring, classifies systems into one of four classes based on a ligand's hydrophobicity and molecular weight. Each version of the two methodologies (a total of 11 methods) was trained against a compiled set of known trypsin binders available in the Protein Data Bank to yield scaling parameters for linear regression models. Both raw and scaled scores were submitted to SAMPL3. Variants of LISA showed relatively low absolute errors but also low correlation with experiment, while the free energy decomposition methods had modest success when scaling factors were included. Nonetheless, re-scaled LISA yielded the best predictions in the challenge in terms of RMS error, and six of these models placed in the top ten best predictions by RMS error. This work highlights some of the difficulties of predicting binding affinities of small molecular fragments to protein receptors as well as the benefit of using training data.
描述了两种结合亲和力估计方法家族,它们被用于 SAMPL3 胰蛋白酶/片段结合亲和力挑战。第一种是基于热力学循环的自由能分解方案,其中包括结合焓和熵以及溶剂贡献的单独贡献。焓贡献是用 PM6-DH2 半经验量子力学相互作用能估计的,并用统计误差校正程序进行了修正。熵贡献是用刚性转子谐波近似估计的,自由能的溶剂贡献是用几种不同的方法估计的。第二种一般方法是经验评分 LISA,它包含几个基于物理的术语,用大型 PDBBind 蛋白质/配体复合物数据库进行了训练。在这里,我们还引入了 LISA+,它是 LISA 的更新版本,在评分之前,根据配体的疏水性和分子量将系统分为四类。这两种方法(共 11 种方法)的每种版本都针对在蛋白质数据库中可用的已知胰蛋白酶结合物的编译集进行了训练,以产生线性回归模型的比例参数。原始和缩放后的分数都提交给了 SAMPL3。LISA 的变体显示出相对较低的绝对误差,但与实验的相关性也较低,而自由能分解方法在包含缩放因子时取得了适度的成功。尽管如此,重新缩放的 LISA 在挑战中以均方根误差获得了最佳预测,其中六个模型在 RMS 误差方面排名前十。这项工作突出了预测小分子片段与蛋白质受体结合亲和力的一些困难,以及使用训练数据的好处。