Li Jiashan, Gong Xinqi
Institute for Mathematical Sciences, School of Mathematics, Renmin University of China, 59 Zhongguancun Street, Beijing, 100872, China.
BMC Bioinformatics. 2025 Feb 17;26(1):55. doi: 10.1186/s12859-025-06064-w.
The binding between proteins and ligands plays a crucial role in the field of drug discovery. However, this area currently faces numerous challenges. On one hand, existing methods are constrained by the limited availability of labeled data, often performing inadequately when addressing complex protein-ligand interactions. On the other hand, many models struggle to effectively capture the flexible variations and relative spatial relationships between proteins and ligands. These issues not only significantly hinder the advancement of protein-ligand binding research but also adversely affect the accuracy and efficiency of drug discovery. Therefore, in response to these challenges, our study aims to enhance predictive capabilities through innovative approaches, providing more reliable support for drug discovery efforts.
This study leverages a pre-trained model with spatial awareness to enhance the prediction of protein-ligand binding affinity. By perturbing the structures of small molecules in a manner consistent with physical constraints and employing self-supervised tasks, we improve the representation of small molecule structures, allowing for better adaptation to affinity predictions. Meanwhile, our approach enables the identification of potential binding sites on proteins.
Our model demonstrates a significantly higher correlation coefficient in binding affinity predictions. Extensive evaluation on the PDBBind v2019 refined set, CASF, and Merck FEP benchmarks confirms the model's robustness and strong generalization across diverse datasets. Additionally, the model achieves over 95% in classification ROC for binding site identification, underscoring its high accuracy in pinpointing protein-ligand interaction regions.
This research presents a novel approach that not only enhances the accuracy of binding affinity predictions but also facilitates the identification of binding sites, showcasing the potential of pre-trained models in computational drug design. Data and code are available at https://github.com/MIALAB-RUC/SableBind .
蛋白质与配体之间的结合在药物发现领域起着至关重要的作用。然而,该领域目前面临众多挑战。一方面,现有方法受到标记数据可用性有限的限制,在处理复杂的蛋白质-配体相互作用时往往表现不佳。另一方面,许多模型难以有效捕捉蛋白质与配体之间的灵活变化和相对空间关系。这些问题不仅严重阻碍了蛋白质-配体结合研究的进展,还对药物发现的准确性和效率产生不利影响。因此,为应对这些挑战,我们的研究旨在通过创新方法提高预测能力,为药物发现工作提供更可靠的支持。
本研究利用具有空间感知的预训练模型来增强蛋白质-配体结合亲和力的预测。通过以符合物理约束的方式扰动小分子结构并采用自监督任务,我们改进了小分子结构的表示,使其更适合亲和力预测。同时,我们的方法能够识别蛋白质上的潜在结合位点。
我们的模型在结合亲和力预测中表现出显著更高的相关系数。对PDBBind v2019精炼集、CASF和默克FEP基准进行的广泛评估证实了该模型在不同数据集上的稳健性和强大的泛化能力。此外,该模型在结合位点识别的分类ROC中达到了95%以上,突出了其在精确确定蛋白质-配体相互作用区域方面的高精度。
本研究提出了一种新颖的方法,不仅提高了结合亲和力预测的准确性,还促进了结合位点的识别,展示了预训练模型在计算药物设计中的潜力。数据和代码可在https://github.com/MIALAB-RUC/SableBind获取。