基于配体残基相互作用谱的机器学习可显著提高结合亲和力预测。

Machine learning on ligand-residue interaction profiles to significantly improve binding affinity prediction.

机构信息

Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA.

出版信息

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab054.

DOI:10.1093/bib/bbab054

PMID:33758923

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8425425/

Abstract

Structure-based virtual screenings (SBVSs) play an important role in drug discovery projects. However, it is still a challenge to accurately predict the binding affinity of an arbitrary molecule binds to a drug target and prioritize top ligands from an SBVS. In this study, we developed a novel method, using ligand-residue interaction profiles (IPs) to construct machine learning (ML)-based prediction models, to significantly improve the screening performance in SBVSs. Such a kind of the prediction model is called an IP scoring function (IP-SF). We systematically investigated how to improve the performance of IP-SFs from many perspectives, including the sampling methods before interaction energy calculation and different ML algorithms. Using six drug targets with each having hundreds of known ligands, we conducted a critical evaluation on the developed IP-SFs. The IP-SFs employing a gradient boosting decision tree (GBDT) algorithm in conjunction with the MIN + GB simulation protocol achieved the best overall performance. Its scoring power, ranking power and screening power significantly outperformed the Glide SF. First, compared with Glide, the average values of mean absolute error and root mean square error of GBDT/MIN + GB decreased about 38 and 36%, respectively. Second, the mean values of squared correlation coefficient and predictive index increased about 225 and 73%, respectively. Third, more encouragingly, the average value of the areas under the curve of receiver operating characteristic for six targets by GBDT, 0.87, is significantly better than that by Glide, which is only 0.71. Thus, we expected IP-SFs to have broad and promising applications in SBVSs.

摘要

基于结构的虚拟筛选 (SBVS) 在药物发现项目中发挥着重要作用。然而，准确预测任意分子与药物靶标的结合亲和力并从 SBVS 中优先选择顶级配体仍然是一个挑战。在这项研究中，我们开发了一种新方法，使用配体-残基相互作用谱 (IP) 构建基于机器学习 (ML) 的预测模型，以显着提高 SBVS 中的筛选性能。这种预测模型称为 IP 评分函数 (IP-SF)。我们从多个角度系统地研究了如何提高 IP-SF 的性能，包括在计算相互作用能之前的采样方法和不同的 ML 算法。使用六个具有数百种已知配体的药物靶标，我们对开发的 IP-SF 进行了严格评估。采用梯度提升决策树 (GBDT) 算法并结合 MIN+GB 模拟方案的 IP-SF 表现出最佳的整体性能。其评分能力、排序能力和筛选能力均明显优于 Glide SF。首先，与 Glide 相比，GBDT/MIN+GB 的平均绝对误差和均方根误差分别降低了约 38%和 36%。其次，平方相关系数和预测指数的平均值分别增加了约 225%和 73%。第三，更令人鼓舞的是，GBDT 对六个靶标的曲线下面积的平均值为 0.87，明显优于 Glide 的 0.71。因此，我们期望 IP-SF 在 SBVS 中具有广泛而有前途的应用。

相似文献

Machine learning on ligand-residue interaction profiles to significantly improve binding affinity prediction.基于配体残基相互作用谱的机器学习可显著提高结合亲和力预测。

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab054.

Boosted neural networks scoring functions for accurate ligand docking and ranking.用于精确配体对接和排序的增强神经网络评分函数。

J Bioinform Comput Biol. 2018 Apr;16(2):1850004. doi: 10.1142/S021972001850004X. Epub 2018 Feb 4.

Task-Specific Scoring Functions for Predicting Ligand Binding Poses and Affinity and for Screening Enrichment.用于预测配体结合构象和亲和力以及进行筛选富集的任务特定评分函数。

J Chem Inf Model. 2018 Jan 22;58(1):119-133. doi: 10.1021/acs.jcim.7b00309. Epub 2017 Dec 20.

Can machine learning consistently improve the scoring power of classical scoring functions? Insights into the role of machine learning in scoring functions.机器学习能否持续提高经典评分函数的评分能力？深入探讨机器学习在评分函数中的作用。

Brief Bioinform. 2021 Jan 18;22(1):497-514. doi: 10.1093/bib/bbz173.

Accuracy or novelty: what can we gain from target-specific machine-learning-based scoring functions in virtual screening?准确性还是新颖性：在虚拟筛选中，基于目标的机器学习打分函数能为我们带来什么？

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbaa410.

Geometry Optimization Algorithms in Conjunction with the Machine Learning Potential ANI-2x Facilitate the Structure-Based Virtual Screening and Binding Mode Prediction.几何优化算法与机器学习势能ANI-2x 相结合，有助于基于结构的虚拟筛选和结合模式预测。

Biomolecules. 2024 May 31;14(6):648. doi: 10.3390/biom14060648.

A Comparative Assessment of Predictive Accuracies of Conventional and Machine Learning Scoring Functions for Protein-Ligand Binding Affinity Prediction.传统评分函数与机器学习评分函数在蛋白质-配体结合亲和力预测中的预测准确性比较评估

IEEE/ACM Trans Comput Biol Bioinform. 2015 Mar-Apr;12(2):335-47. doi: 10.1109/TCBB.2014.2351824.

A comparative assessment of ranking accuracies of conventional and machine-learning-based scoring functions for protein-ligand binding affinity prediction.常规与基于机器学习打分函数对蛋白质-配体结合亲和力预测的排序准确性比较评估。

IEEE/ACM Trans Comput Biol Bioinform. 2012 Sep-Oct;9(5):1301-13. doi: 10.1109/TCBB.2012.36.

Improving structure-based virtual screening performance via learning from scoring function components.通过从打分函数组件中学习来提高基于结构的虚拟筛选性能。

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa094.

Assessing Molecular Docking Tools to Guide Targeted Drug Discovery of CD38 Inhibitors.评估分子对接工具以指导针对 CD38 抑制剂的靶向药物发现。

Int J Mol Sci. 2020 Jul 22;21(15):5183. doi: 10.3390/ijms21155183.

引用本文的文献

A Full-Spectrum Generative Lead Discovery (FSGLD) Pipeline via DRUG-GAN: A Multiscale Method for Drug-like/Target-specific Compound Library Generation.通过DRUG-GAN的全谱生成式先导发现（FSGLD）流程：一种用于生成类药物/靶点特异性化合物库的多尺度方法。

Res Sq. 2025 May 12:rs.3.rs-6516504. doi: 10.21203/rs.3.rs-6516504/v1.

Advancing promiscuous aggregating inhibitor analysis with intelligent machine learning classification.通过智能机器学习分类推进混杂聚集抑制剂分析。

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf205.

Novel target identification towards drug repurposing based on biological activity profiles.基于生物活性谱的药物再利用新靶点识别

PLoS One. 2025 May 6;20(5):e0319865. doi: 10.1371/journal.pone.0319865. eCollection 2025.

Biomolecules. 2024 May 31;14(6):648. doi: 10.3390/biom14060648.

GSScore: a novel Graphormer-based shell-like scoring method for protein-ligand docking.GSScore：一种基于 Graphormer 的新型贝壳状打分方法，用于蛋白质-配体对接。

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae201.

In Silico Screening of Natural Flavonoids against 3-Chymotrypsin-like Protease of SARS-CoV-2 Using Machine Learning and Molecular Modeling.基于机器学习和分子建模的针对 SARS-CoV-2 3-胰凝乳蛋白酶样蛋白酶的天然类黄酮的计算机筛选。

Molecules. 2023 Dec 10;28(24):8034. doi: 10.3390/molecules28248034.

Discovery of Potent and Selective CB2 Agonists Utilizing a Function-Based Computational Screening Protocol.利用基于功能的计算筛选方案发现有效且选择性的 CB2 激动剂。

ACS Chem Neurosci. 2023 Nov 1;14(21):3941-3958. doi: 10.1021/acschemneuro.3c00580. Epub 2023 Oct 12.

binding affinity prediction for metabotropic glutamate receptors using both endpoint free energy methods and a machine learning-based scoring function.使用终点自由能方法和基于机器学习的评分函数预测代谢型谷氨酸受体的结合亲和力。

Phys Chem Chem Phys. 2022 Aug 3;24(30):18291-18305. doi: 10.1039/d2cp01727j.

Structure-based protein-ligand interaction fingerprints for binding affinity prediction.用于结合亲和力预测的基于结构的蛋白质-配体相互作用指纹图谱。

Comput Struct Biotechnol J. 2021 Nov 25;19:6291-6300. doi: 10.1016/j.csbj.2021.11.018. eCollection 2021.

本文引用的文献

Computationally predicting binding affinity in protein-ligand complexes: free energy-based simulations and machine learning-based scoring functions.计算预测蛋白质-配体复合物中的结合亲和力：基于自由能的模拟和基于机器学习的评分函数。

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa107.

Fast, Accurate, and Reliable Protocols for Routine Calculations of Protein-Ligand Binding Affinities in Drug Design Projects Using AMBER GPU-TI with ff14SB/GAFF.使用带有ff14SB/GAFF的AMBER GPU-TI在药物设计项目中进行蛋白质-配体结合亲和力常规计算的快速、准确且可靠的协议。

ACS Omega. 2020 Feb 25;5(9):4611-4619. doi: 10.1021/acsomega.9b04233. eCollection 2020 Mar 10.

Combined strategies in structure-based virtual screening.基于结构的虚拟筛选中的联合策略。

Phys Chem Chem Phys. 2020 Feb 14;22(6):3149-3159. doi: 10.1039/c9cp06303j. Epub 2020 Jan 29.

Improved protein structure prediction using potentials from deep learning.利用深度学习势进行蛋白质结构预测的改进。

Nature. 2020 Jan;577(7792):706-710. doi: 10.1038/s41586-019-1923-7. Epub 2020 Jan 15.

Virtual Screening Techniques in Drug Discovery: Review and Recent Applications.虚拟筛选技术在药物发现中的应用：综述与最新进展

Curr Top Med Chem. 2019;19(19):1751-1767. doi: 10.2174/1568026619666190816101948.

End-Point Binding Free Energy Calculation with MM/PBSA and MM/GBSA: Strategies and Applications in Drug Design.基于 MM/PBSA 和 MM/GBSA 的终点结合自由能计算：在药物设计中的策略与应用。

Chem Rev. 2019 Aug 28;119(16):9478-9508. doi: 10.1021/acs.chemrev.9b00055. Epub 2019 Jun 24.

deepDR: a network-based deep learning approach to in silico drug repositioning.深度重定位（deepDR）：一种基于网络的深度学习方法，用于计算机药物重定位。

Bioinformatics. 2019 Dec 15;35(24):5191-5198. doi: 10.1093/bioinformatics/btz418.

Classical scoring functions for docking are unable to exploit large volumes of structural and interaction data.经典的对接打分函数无法利用大量的结构和相互作用数据。

Bioinformatics. 2019 Oct 15;35(20):3989-3995. doi: 10.1093/bioinformatics/btz183.

RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy.RCSB 蛋白质数据库：生物大分子结构，推动基础生物学、生物医学、生物技术和能源领域的研究和教育。

Nucleic Acids Res. 2019 Jan 8;47(D1):D464-D474. doi: 10.1093/nar/gky1004.

Empirical Scoring Functions for Structure-Based Virtual Screening: Applications, Critical Aspects, and Challenges.基于结构的虚拟筛选的经验评分函数：应用、关键方面及挑战

Front Pharmacol. 2018 Sep 24;9:1089. doi: 10.3389/fphar.2018.01089. eCollection 2018.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验