Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA.
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab054.
Structure-based virtual screenings (SBVSs) play an important role in drug discovery projects. However, it is still a challenge to accurately predict the binding affinity of an arbitrary molecule binds to a drug target and prioritize top ligands from an SBVS. In this study, we developed a novel method, using ligand-residue interaction profiles (IPs) to construct machine learning (ML)-based prediction models, to significantly improve the screening performance in SBVSs. Such a kind of the prediction model is called an IP scoring function (IP-SF). We systematically investigated how to improve the performance of IP-SFs from many perspectives, including the sampling methods before interaction energy calculation and different ML algorithms. Using six drug targets with each having hundreds of known ligands, we conducted a critical evaluation on the developed IP-SFs. The IP-SFs employing a gradient boosting decision tree (GBDT) algorithm in conjunction with the MIN + GB simulation protocol achieved the best overall performance. Its scoring power, ranking power and screening power significantly outperformed the Glide SF. First, compared with Glide, the average values of mean absolute error and root mean square error of GBDT/MIN + GB decreased about 38 and 36%, respectively. Second, the mean values of squared correlation coefficient and predictive index increased about 225 and 73%, respectively. Third, more encouragingly, the average value of the areas under the curve of receiver operating characteristic for six targets by GBDT, 0.87, is significantly better than that by Glide, which is only 0.71. Thus, we expected IP-SFs to have broad and promising applications in SBVSs.
基于结构的虚拟筛选 (SBVS) 在药物发现项目中发挥着重要作用。然而,准确预测任意分子与药物靶标的结合亲和力并从 SBVS 中优先选择顶级配体仍然是一个挑战。在这项研究中,我们开发了一种新方法,使用配体-残基相互作用谱 (IP) 构建基于机器学习 (ML) 的预测模型,以显着提高 SBVS 中的筛选性能。这种预测模型称为 IP 评分函数 (IP-SF)。我们从多个角度系统地研究了如何提高 IP-SF 的性能,包括在计算相互作用能之前的采样方法和不同的 ML 算法。使用六个具有数百种已知配体的药物靶标,我们对开发的 IP-SF 进行了严格评估。采用梯度提升决策树 (GBDT) 算法并结合 MIN+GB 模拟方案的 IP-SF 表现出最佳的整体性能。其评分能力、排序能力和筛选能力均明显优于 Glide SF。首先,与 Glide 相比,GBDT/MIN+GB 的平均绝对误差和均方根误差分别降低了约 38%和 36%。其次,平方相关系数和预测指数的平均值分别增加了约 225%和 73%。第三,更令人鼓舞的是,GBDT 对六个靶标的曲线下面积的平均值为 0.87,明显优于 Glide 的 0.71。因此,我们期望 IP-SF 在 SBVS 中具有广泛而有前途的应用。