Ji Beihong, He Xibing, Zhang Yuzhao, Zhai Jingchen, Man Viet Hoang, Liu Shuhan, Wang Junmei
Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA, 15261, USA.
J Cheminform. 2021 Feb 15;13(1):11. doi: 10.1186/s13321-021-00493-4.
In this study, we developed a novel algorithm to improve the screening performance of an arbitrary docking scoring function by recalibrating the docking score of a query compound based on its structure similarity with a set of training compounds, while the extra computational cost is neglectable. Two popular docking methods, Glide and AutoDock Vina were adopted as the original scoring functions to be processed with our new algorithm and similar improvement performance was achieved. Predicted binding affinities were compared against experimental data from ChEMBL and DUD-E databases. 11 representative drug receptors from diverse drug target categories were applied to evaluate the hybrid scoring function. The effects of four different fingerprints (FP2, FP3, FP4, and MACCS) and the four different compound similarity effect (CSE) functions were explored. Encouragingly, the screening performance was significantly improved for all 11 drug targets especially when CSE = S (S is the Tanimoto structural similarity) and FP2 fingerprint were applied. The average predictive index (PI) values increased from 0.34 to 0.66 and 0.39 to 0.71 for the Glide and AutoDock vina scoring functions, respectively. To evaluate the performance of the calibration algorithm in drug lead identification, we also imposed an upper limit on the structural similarity to mimic the real scenario of screening diverse libraries for which query ligands are general-purpose screening compounds and they are not necessarily structurally similar to reference ligands. Encouragingly, we found our hybrid scoring function still outperformed the original docking scoring function. The hybrid scoring function was further evaluated using external datasets for two systems and we found the PI values increased from 0.24 to 0.46 and 0.14 to 0.42 for A2AR and CFX systems, respectively. In a conclusion, our calibration algorithm can significantly improve the virtual screening performance in both drug lead optimization and identification phases with neglectable computational cost.
在本研究中,我们开发了一种新算法,通过基于查询化合物与一组训练化合物的结构相似性重新校准其对接分数,来提高任意对接评分函数的筛选性能,而额外的计算成本可忽略不计。采用两种流行的对接方法Glide和AutoDock Vina作为原始评分函数,用我们的新算法进行处理,并取得了类似的改进性能。将预测的结合亲和力与来自ChEMBL和DUD-E数据库的实验数据进行比较。应用11种来自不同药物靶标类别的代表性药物受体来评估混合评分函数。探索了四种不同的指纹(FP2、FP3、FP4和MACCS)以及四种不同的化合物相似性效应(CSE)函数的影响。令人鼓舞的是,对于所有11种药物靶标,筛选性能都有显著提高,尤其是当应用CSE = S(S是Tanimoto结构相似性)和FP2指纹时。对于Glide和AutoDock vina评分函数,平均预测指数(PI)值分别从0.34提高到0.66和从0.39提高到0.71。为了评估校准算法在药物先导物识别中的性能,我们还对结构相似性施加了上限,以模拟筛选多样化文库的实际情况,其中查询配体是通用筛选化合物,它们不一定与参考配体在结构上相似。令人鼓舞的是,我们发现我们的混合评分函数仍然优于原始对接评分函数。使用两个系统的外部数据集对混合评分函数进行了进一步评估,我们发现对于A2AR和CFX系统,PI值分别从0.24提高到0.46和从0.14提高到0.42。总之,我们的校准算法可以在可忽略的计算成本下,显著提高药物先导物优化和识别阶段的虚拟筛选性能。