Department of Chemistry and Biochemistry, University of Delaware, Newark, Delaware 19716, United States.
J Chem Inf Model. 2011 Sep 26;51(9):2047-65. doi: 10.1021/ci1003009. Epub 2011 Jun 6.
The performances of several two-step scoring approaches for molecular docking were assessed for their ability to predict binding geometries and free energies. Two new scoring functions designed for "step 2 discrimination" were proposed and compared to our CHARMM implementation of the linear interaction energy (LIE) approach using the Generalized-Born with Molecular Volume (GBMV) implicit solvation model. A scoring function S1 was proposed by considering only "interacting" ligand atoms as the "effective size" of the ligand and extended to an empirical regression-based pair potential S2. The S1 and S2 scoring schemes were trained and 5-fold cross-validated on a diverse set of 259 protein-ligand complexes from the Ligand Protein Database (LPDB). The regression-based parameters for S1 and S2 also demonstrated reasonable transferability in the CSARdock 2010 benchmark using a new data set (NRC HiQ) of diverse protein-ligand complexes. The ability of the scoring functions to accurately predict ligand geometry was evaluated by calculating the discriminative power (DP) of the scoring functions to identify native poses. The parameters for the LIE scoring function with the optimal discriminative power (DP) for geometry (step 1 discrimination) were found to be very similar to the best-fit parameters for binding free energy over a large number of protein-ligand complexes (step 2 discrimination). Reasonable performance of the scoring functions in enrichment of active compounds in four different protein target classes established that the parameters for S1 and S2 provided reasonable accuracy and transferability. Additional analysis was performed to definitively separate scoring function performance from molecular weight effects. This analysis included the prediction of ligand binding efficiencies for a subset of the CSARdock NRC HiQ data set where the number of ligand heavy atoms ranged from 17 to 35. This range of ligand heavy atoms is where improved accuracy of predicted ligand efficiencies is most relevant to real-world drug design efforts.
评估了几种两步打分方法在预测结合构象和自由能方面的性能。提出了两种新的针对“第二步区分”设计的打分函数,并与我们使用广义Born 与分子体积(GBMV)隐式溶剂模型的 CHARMM 实现的线性相互作用能(LIE)方法进行了比较。打分函数 S1 仅考虑“相互作用”的配体原子作为配体的“有效大小”,并扩展为基于经验回归的对势能 S2。S1 和 S2 打分方案在来自 Ligand Protein Database(LPDB)的 259 个蛋白质-配体复合物的多样化数据集上进行了训练和 5 折交叉验证。S1 和 S2 的回归参数在使用多样化蛋白质-配体复合物的新数据集(NRC HiQ)的 CSARdock 2010 基准测试中也表现出了合理的可转移性。通过计算打分函数识别天然构象的区分能力(DP)来评估打分函数准确预测配体构象的能力。对于具有最佳 DP 的 LIE 打分函数的参数(用于第一步区分的几何形状)与在大量蛋白质-配体复合物上进行最佳拟合的参数(用于第二步区分的结合自由能)非常相似。打分函数在四种不同蛋白质靶类别的活性化合物的富集中的合理性能表明,S1 和 S2 的参数提供了合理的准确性和可转移性。进一步的分析明确地将打分函数的性能与分子量效应分开。这项分析包括对 CSARdock NRC HiQ 数据集的一部分的配体结合效率的预测,其中配体重原子的数量从 17 到 35 不等。这个配体重原子的范围是预测配体效率的准确性对实际药物设计工作最相关的范围。