深度生成模型中基于结构和配体的评分函数比较：以G蛋白偶联受体为例的研究

Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study.

作者信息

Thomas Morgan, Smith Robert T, O'Boyle Noel M, de Graaf Chris, Bender Andreas

机构信息

Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK.

Computational Chemistry, Sosei Heptares, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG, UK.

出版信息

J Cheminform. 2021 May 13;13(1):39. doi: 10.1186/s13321-021-00516-0.

DOI:10.1186/s13321-021-00516-0

PMID:33985583

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8117600/

Abstract

Deep generative models have shown the ability to devise both valid and novel chemistry, which could significantly accelerate the identification of bioactive compounds. Many current models, however, use molecular descriptors or ligand-based predictive methods to guide molecule generation towards a desirable property space. This restricts their application to relatively data-rich targets, neglecting those where little data is available to sufficiently train a predictor. Moreover, ligand-based approaches often bias molecule generation towards previously established chemical space, thereby limiting their ability to identify truly novel chemotypes. In this work, we assess the ability of using molecular docking via Glide-a structure-based approach-as a scoring function to guide the deep generative model REINVENT and compare model performance and behaviour to a ligand-based scoring function. Additionally, we modify the previously published MOSES benchmarking dataset to remove any induced bias towards non-protonatable groups. We also propose a new metric to measure dataset diversity, which is less confounded by the distribution of heavy atom count than the commonly used internal diversity metric. With respect to the main findings, we found that when optimizing the docking score against DRD2, the model improves predicted ligand affinity beyond that of known DRD2 active molecules. In addition, generated molecules occupy complementary chemical and physicochemical space compared to the ligand-based approach, and novel physicochemical space compared to known DRD2 active molecules. Furthermore, the structure-based approach learns to generate molecules that satisfy crucial residue interactions, which is information only available when taking protein structure into account. Overall, this work demonstrates the advantage of using molecular docking to guide de novo molecule generation over ligand-based predictors with respect to predicted affinity, novelty, and the ability to identify key interactions between ligand and protein target. Practically, this approach has applications in early hit generation campaigns to enrich a virtual library towards a particular target, and also in novelty-focused projects, where de novo molecule generation either has no prior ligand knowledge available or should not be biased by it.

摘要

深度生成模型已展现出设计有效且新颖化学结构的能力，这能够显著加速生物活性化合物的识别。然而，当前许多模型使用分子描述符或基于配体的预测方法，将分子生成引导至理想的性质空间。这限制了它们在相对数据丰富的靶点上的应用，而忽略了那些几乎没有足够数据来充分训练预测器的靶点。此外，基于配体的方法通常会使分子生成偏向于先前已确立的化学空间，从而限制了它们识别真正新颖化学型的能力。在这项工作中，我们评估了通过Glide（一种基于结构的方法）进行分子对接作为评分函数来引导深度生成模型REINVENT的能力，并将模型性能和行为与基于配体的评分函数进行比较。此外，我们修改了先前发布的MOSES基准数据集，以消除对不可质子化基团的任何诱导偏差。我们还提出了一种新的指标来衡量数据集的多样性，与常用的内部多样性指标相比，该指标受重原子计数分布的干扰较小。关于主要发现，我们发现当针对DRD2优化对接分数时，该模型提高了预测的配体亲和力，超过了已知的DRD2活性分子。此外，与基于配体的方法相比，生成的分子占据互补的化学和物理化学空间，与已知的DRD2活性分子相比，占据新颖的物理化学空间。此外，基于结构的方法学会生成满足关键残基相互作用的分子，而这种信息只有在考虑蛋白质结构时才会出现。总体而言，这项工作证明了在预测亲和力、新颖性以及识别配体与蛋白质靶点之间关键相互作用的能力方面，使用分子对接来引导从头分子生成优于基于配体的预测器。实际上，这种方法可应用于早期命中化合物生成活动，以针对特定靶点丰富虚拟库，也可应用于注重新颖性的项目，在这些项目中，从头分子生成要么没有可用的先前配体知识，要么不应受其影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7505/8117600/bd7c07f17b5f/13321_2021_516_Fig1_HTML.jpg

相似文献

Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study.深度生成模型中基于结构和配体的评分函数比较：以G蛋白偶联受体为例的研究

J Cheminform. 2021 May 13;13(1):39. doi: 10.1186/s13321-021-00516-0.

Augmented Hill-Climb increases reinforcement learning efficiency for language-based de novo molecule generation.增强爬山算法提高了基于语言的从头分子生成的强化学习效率。

J Cheminform. 2022 Oct 3;14(1):68. doi: 10.1186/s13321-022-00646-z.

Optimization of binding affinities in chemical space with generative pre-trained transformer and deep reinforcement learning.利用生成式预训练变换器和深度强化学习在化学空间中优化结合亲和力

F1000Res. 2024 Feb 20;12:757. doi: 10.12688/f1000research.130936.2. eCollection 2023.

Optimizing interactions to protein binding sites by integrating docking-scoring strategies into generative AI methods.通过将对接评分策略整合到生成式人工智能方法中，优化与蛋白质结合位点的相互作用。

Front Chem. 2022 Oct 19;10:1012507. doi: 10.3389/fchem.2022.1012507. eCollection 2022.

DockStream: a docking wrapper to enhance de novo molecular design.DockStream：一种用于增强从头分子设计的对接包装程序。

J Cheminform. 2021 Nov 17;13(1):89. doi: 10.1186/s13321-021-00563-7.

MolScore: a scoring, evaluation and benchmarking framework for generative models in de novo drug design.MolScore：一种用于从头药物设计中生成模型的评分、评估和基准测试框架。

J Cheminform. 2024 May 30;16(1):64. doi: 10.1186/s13321-024-00861-w.

De novo design with deep generative models based on 3D similarity scoring.基于 3D 相似度评分的深度生成模型从头设计。

Bioorg Med Chem. 2021 Aug 15;44:116308. doi: 10.1016/j.bmc.2021.116308. Epub 2021 Jul 9.

Drug Design Using Reinforcement Learning with Graph-Based Deep Generative Models.基于图的深度生成模型的强化学习药物设计。

J Chem Inf Model. 2022 Oct 24;62(20):4863-4872. doi: 10.1021/acs.jcim.2c00838. Epub 2022 Oct 11.

Molecule Design Using Molecular Generative Models Constrained by Ligand-Protein Interactions.基于配体-蛋白相互作用约束的分子生成模型的分子设计。

J Chem Inf Model. 2022 Jul 25;62(14):3291-3306. doi: 10.1021/acs.jcim.2c00177. Epub 2022 Jul 6.

From Target to Drug: Generative Modeling for the Multimodal Structure-Based Ligand Design.从靶点到药物：基于多模态结构的配体生成式设计。

Mol Pharm. 2019 Oct 7;16(10):4282-4291. doi: 10.1021/acs.molpharmaceut.9b00634. Epub 2019 Sep 10.

引用本文的文献

Benchmarking 3D Structure-Based Molecule Generators.基于3D结构的分子生成器的基准测试

J Chem Inf Model. 2025 Aug 11;65(15):8006-8021. doi: 10.1021/acs.jcim.5c01020. Epub 2025 Jul 25.

AI meets physics in computational structure-based drug discovery for GPCRs.在基于计算结构的G蛋白偶联受体药物发现中，人工智能与物理学相遇。

NPJ Drug Discov. 2025;2(1):16. doi: 10.1038/s44386-025-00019-0. Epub 2025 Jul 3.

Identification of nanomolar adenosine A receptor ligands using reinforcement learning and structure-based drug design.利用强化学习和基于结构的药物设计鉴定纳摩尔级别的腺苷 A 受体配体。

Nat Commun. 2025 Jul 1;16(1):5485. doi: 10.1038/s41467-025-60629-0.

Activity cliff-aware reinforcement learning for de novo drug design.用于从头药物设计的活动悬崖感知强化学习

J Cheminform. 2025 Apr 21;17(1):54. doi: 10.1186/s13321-025-01006-3.

New strategies to enhance the efficiency and precision of drug discovery.提高药物研发效率和精准度的新策略。

Front Pharmacol. 2025 Feb 11;16:1550158. doi: 10.3389/fphar.2025.1550158. eCollection 2025.

A systematic review of deep learning chemical language models in recent era.近期深度学习化学语言模型的系统综述。

J Cheminform. 2024 Nov 18;16(1):129. doi: 10.1186/s13321-024-00916-y.

ACEGEN: Reinforcement Learning of Generative Chemical Agents for Drug Discovery.ACEGEN：用于药物发现的生成式化学试剂的强化学习。

J Chem Inf Model. 2024 Aug 12;64(15):5900-5911. doi: 10.1021/acs.jcim.4c00895. Epub 2024 Aug 2.

Diverse Hits in De Novo Molecule Design: Diversity-Based Comparison of Goal-Directed Generators.从头分子设计中的多样化命中：基于多样性的目标导向生成器的比较。

J Chem Inf Model. 2024 Aug 12;64(15):5756-5761. doi: 10.1021/acs.jcim.4c00519. Epub 2024 Jul 19.

J Cheminform. 2024 May 30;16(1):64. doi: 10.1186/s13321-024-00861-w.

Development of scoring-assisted generative exploration (SAGE) and its application to dual inhibitor design for acetylcholinesterase and monoamine oxidase B.评分辅助生成性探索（SAGE）的开发及其在乙酰胆碱酯酶和单胺氧化酶B双重抑制剂设计中的应用。

J Cheminform. 2024 May 24;16(1):59. doi: 10.1186/s13321-024-00845-w.

本文引用的文献

Generative Models Should at Least Be Able to Design Molecules That Dock Well: A New Benchmark.生成模型至少应能够设计出与靶点结合良好的分子：一个新的基准。

J Chem Inf Model. 2023 Jun 12;63(11):3238-3247. doi: 10.1021/acs.jcim.2c01355. Epub 2023 May 24.

Navigating Chemical Space by Interfacing Generative Artificial Intelligence and Molecular Docking.通过生成式人工智能和分子对接技术探索化学空间

J Chem Inf Model. 2021 Nov 22;61(11):5589-5600. doi: 10.1021/acs.jcim.1c00746. Epub 2021 Oct 11.

Comparative Study of Deep Generative Models on Chemical Space Coverage.化学空间覆盖的深度生成模型比较研究。

J Chem Inf Model. 2021 Jun 28;61(6):2572-2581. doi: 10.1021/acs.jcim.0c01328. Epub 2021 May 20.

Chemical space exploration based on recurrent neural networks: applications in discovering kinase inhibitors.基于循环神经网络的化学空间探索：在发现激酶抑制剂中的应用

J Cheminform. 2020 Jun 8;12(1):42. doi: 10.1186/s13321-020-00446-3.

A de novo molecular generation method using latent vector based generative adversarial network.一种使用基于潜在向量的生成对抗网络的从头分子生成方法。

J Cheminform. 2019 Dec 3;11(1):74. doi: 10.1186/s13321-019-0397-9.

Molecular Design in Synthetically Accessible Chemical Space via Deep Reinforcement Learning.通过深度强化学习在合成可达化学空间中的分子设计

ACS Omega. 2020 Dec 15;5(51):32984-32994. doi: 10.1021/acsomega.0c04153. eCollection 2020 Dec 29.

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models.分子集（MOSES）：分子生成模型的基准测试平台。

Front Pharmacol. 2020 Dec 18;11:565644. doi: 10.3389/fphar.2020.565644. eCollection 2020.

On failure modes in molecule generation and optimization.关于分子生成和优化中的失效模式。

Drug Discov Today Technol. 2019 Dec;32-33:55-63. doi: 10.1016/j.ddtec.2020.09.003. Epub 2020 Oct 24.

Memory-assisted reinforcement learning for diverse molecular de novo design.用于多样分子从头设计的记忆辅助强化学习

J Cheminform. 2020 Nov 10;12(1):68. doi: 10.1186/s13321-020-00473-0.

'It will change everything': DeepMind's AI makes gigantic leap in solving protein structures.“它将改变一切”：深度思维公司的人工智能在解决蛋白质结构问题上取得巨大飞跃。

Nature. 2020 Dec;588(7837):203-204. doi: 10.1038/d41586-020-03348-4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

深度生成模型中基于结构和配体的评分函数比较：以G蛋白偶联受体为例的研究

Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献