Suppr超能文献

机器学习能否持续提高经典评分函数的评分能力?深入探讨机器学习在评分函数中的作用。

Can machine learning consistently improve the scoring power of classical scoring functions? Insights into the role of machine learning in scoring functions.

作者信息

Shen Chao, Hu Ye, Wang Zhe, Zhang Xujun, Zhong Haiyang, Wang Gaoang, Yao Xiaojun, Xu Lei, Cao Dongsheng, Hou Tingjun

出版信息

Brief Bioinform. 2021 Jan 18;22(1):497-514. doi: 10.1093/bib/bbz173.

Abstract

How to accurately estimate protein-ligand binding affinity remains a key challenge in computer-aided drug design (CADD). In many cases, it has been shown that the binding affinities predicted by classical scoring functions (SFs) cannot correlate well with experimentally measured biological activities. In the past few years, machine learning (ML)-based SFs have gradually emerged as potential alternatives and outperformed classical SFs in a series of studies. In this study, to better recognize the potential of classical SFs, we have conducted a comparative assessment of 25 commonly used SFs. Accordingly, the scoring power was systematically estimated by using the state-of-the-art ML methods that replaced the original multiple linear regression method to refit individual energy terms. The results show that the newly-developed ML-based SFs consistently performed better than classical ones. In particular, gradient boosting decision tree (GBDT) and random forest (RF) achieved the best predictions in most cases. The newly-developed ML-based SFs were also tested on another benchmark modified from PDBbind v2007, and the impacts of structural and sequence similarities were evaluated. The results indicated that the superiority of the ML-based SFs could be fully guaranteed when sufficient similar targets were contained in the training set. Moreover, the effect of the combinations of features from multiple SFs was explored, and the results indicated that combining NNscore2.0 with one to four other classical SFs could yield the best scoring power. However, it was not applicable to derive a generic target-specific SF or SF combination.

摘要

如何准确估计蛋白质-配体结合亲和力仍然是计算机辅助药物设计(CADD)中的一个关键挑战。在许多情况下,已经表明经典评分函数(SFs)预测的结合亲和力与实验测量的生物活性相关性不佳。在过去几年中,基于机器学习(ML)的SFs逐渐成为潜在的替代方法,并在一系列研究中表现优于经典SFs。在本研究中,为了更好地认识经典SFs的潜力,我们对25种常用的SFs进行了比较评估。因此,通过使用取代原始多元线性回归方法来重新拟合各个能量项的最新ML方法,系统地估计了评分能力。结果表明,新开发的基于ML的SFs始终比经典SFs表现更好。特别是,梯度提升决策树(GBDT)和随机森林(RF)在大多数情况下实现了最佳预测。新开发的基于ML的SFs还在另一个从PDBbind v2007修改而来的基准上进行了测试,并评估了结构和序列相似性的影响。结果表明,当训练集中包含足够多的相似靶点时,基于ML的SFs的优越性可以得到充分保证。此外,还探索了多个SFs特征组合的效果,结果表明将NNscore2.0与一到四个其他经典SFs相结合可以产生最佳的评分能力。然而,推导通用的靶点特异性SF或SF组合并不适用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验