Pal Sapna, Pal Ankita, Mohanty Debasisa
Bioinformatics Center, National Institute of Immunology, New Delhi, India.
Protein Sci. 2025 Jan;34(1):e5257. doi: 10.1002/pro.5257.
Computational methods to predict binding affinity of protein-ligand complex have been used extensively to design inhibitors for proteins selected as drug targets. In recent years machine learning (ML) is being increasingly used for design of drugs/inhibitors. However, ranking compounds as per their experimental binding affinity has remained a major challenge. Therefore, it is necessary to develop ML-based scoring function (MLSF) for predicting the binding affinity of protein-ligand complexes. In this work, protein-ligand interaction features, namely, extended connectivity interaction fingerprints (ECIF), derived from the PDBbind dataset have been used to build ML models for binding affinity prediction. The benchmarking has been done on the Comparative Assessment of Scoring Functions (CASF) dataset and also by predicting the binding affinity of unseen protein-ligand complexes which have structural features different from those present in the training dataset. Furthermore, an improvement in the performance of MLSF on the redocked CASF complexes generated by AutoDock Vina software was seen when the training set consisting of crystal structures was supplemented with redocked protein-ligand complexes. The MLSF trained on crystal structures alone using a combination of ECIF and VINA features also predicted binding affinities of crystal as well as docked complexes with high accuracy. Overall, the MLSF developed in this work shows improved performance compared to conventional SFs and several other MLSFs. It will be a valuable resource for identifying novel inhibitors by structure-based virtual screening protocols. The proposed MLSF SG-ML-PLAP (Structure-Guided Machine-Learning-based Protein-Ligand Affinity Predictor) is freely accessible as a webserver, http://www.nii.ac.in/sg-ml-plap.html.
预测蛋白质-配体复合物结合亲和力的计算方法已被广泛用于设计针对被选为药物靶点的蛋白质的抑制剂。近年来,机器学习(ML)越来越多地用于药物/抑制剂的设计。然而,根据化合物的实验结合亲和力进行排序仍然是一个重大挑战。因此,有必要开发基于ML的评分函数(MLSF)来预测蛋白质-配体复合物的结合亲和力。在这项工作中,从PDBbind数据集中导出的蛋白质-配体相互作用特征,即扩展连接性相互作用指纹(ECIF),已被用于构建结合亲和力预测的ML模型。基准测试是在评分函数比较评估(CASF)数据集上进行的,并且还通过预测与训练数据集中存在的结构特征不同的未见蛋白质-配体复合物的结合亲和力来进行。此外,当由晶体结构组成的训练集补充了重新对接的蛋白质-配体复合物时,在AutoDock Vina软件生成的重新对接的CASF复合物上,MLSF的性能有了提高。仅使用ECIF和VINA特征的组合在晶体结构上训练的MLSF也能高精度地预测晶体复合物以及对接复合物的结合亲和力。总体而言,这项工作中开发的MLSF与传统评分函数和其他几种MLSF相比,性能有所提高。它将成为通过基于结构的虚拟筛选协议识别新型抑制剂的宝贵资源。所提出的MLSF SG-ML-PLAP(基于结构引导机器学习的蛋白质-配体亲和力预测器)可作为网络服务器免费访问,网址为http://www.nii.ac.in/sg-ml-plap.html。