用于目标导向分子生成的人工参与式主动学习

Human-in-the-loop active learning for goal-oriented molecule generation.

作者信息

Nahal Yasmine, Menke Janosch, Martinelli Julien, Heinonen Markus, Kabeshov Mikhail, Janet Jon Paul, Nittinger Eva, Engkvist Ola, Kaski Samuel

机构信息

Department of Computer Science, Aalto University, 02150, Espoo, Finland.

Molecular AI, Discovery Sciences, R&D, AstraZeneca, 431 83, Mölndal, Sweden.

出版信息

J Cheminform. 2024 Dec 9;16(1):138. doi: 10.1186/s13321-024-00924-y.

DOI:10.1186/s13321-024-00924-y

PMID:39654043

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11629536/

Abstract

Machine learning (ML) systems have enabled the modelling of quantitative structure-property relationships (QSPR) and structure-activity relationships (QSAR) using existing experimental data to predict target properties for new molecules. These property predictors hold significant potential in accelerating drug discovery by guiding generative artificial intelligence (AI) agents to explore desired chemical spaces. However, they often struggle to generalize due to the limited scope of the training data. When optimized by generative agents, this limitation can result in the generation of molecules with artificially high predicted probabilities of satisfying target properties, which subsequently fail experimental validation. To address this challenge, we propose an adaptive approach that integrates active learning (AL) and iterative feedback to refine property predictors, thereby improving the outcomes of their optimization by generative AI agents. Our method leverages the Expected Predictive Information Gain (EPIG) criterion to select additional molecules for evaluation by an oracle. This process aims to provide the greatest reduction in predictive uncertainty, enabling more accurate model evaluations of subsequently generated molecules. Recognizing the impracticality of immediate wet-lab or physics-based experiments due to time and logistical constraints, we propose leveraging human experts for their cost-effectiveness and domain knowledge to effectively augment property predictors, bridging gaps in the limited training data. Empirical evaluations through both simulated and real human-in-the-loop experiments demonstrate that our approach refines property predictors to better align with oracle assessments. Additionally, we observe improved accuracy of predicted properties as well as improved drug-likeness among the top-ranking generated molecules. SCIENTIFIC CONTRIBUTION: We present an adaptable framework that integrates AL and human expertise to refine property predictors for goal-oriented molecule generation. This approach is robust to noise in human feedback and ensures that navigating chemical space with human-refined predictors leverages human insights to identify molecules that not only satisfy predicted property profiles but also score highly on oracle models. Additionally, it prioritizes practical characteristics such as drug-likeness, synthetic accessibility, and a favorable balance between exploring diverse chemical space and exploiting similarity to existing training data.

摘要

机器学习（ML）系统能够利用现有的实验数据对定量构效关系（QSPR）和构效关系（QSAR）进行建模，以预测新分子的目标性质。这些性质预测器在通过引导生成式人工智能（AI）代理探索所需化学空间来加速药物发现方面具有巨大潜力。然而，由于训练数据范围有限，它们往往难以进行泛化。当由生成代理进行优化时，这种局限性可能导致生成具有人为高预测概率满足目标性质的分子，而这些分子随后无法通过实验验证。为应对这一挑战，我们提出一种自适应方法，该方法整合主动学习（AL）和迭代反馈来优化性质预测器，从而改善生成式AI代理对其进行优化的结果。我们的方法利用预期预测信息增益（EPIG）标准来选择额外的分子以供神谕进行评估。这一过程旨在最大程度地降低预测不确定性，从而能够对随后生成的分子进行更准确的模型评估。由于时间和后勤限制，认识到立即进行湿实验室或基于物理的实验不切实际，我们建议利用人类专家的成本效益和领域知识来有效地增强性质预测器，弥合有限训练数据中的差距。通过模拟和实际的人在回路实验进行的实证评估表明，我们的方法优化了性质预测器，使其与神谕评估更好地一致。此外，我们观察到预测性质的准确性有所提高，并且在排名靠前的生成分子中药物相似性也有所提高。科学贡献：我们提出了一个适应性框架，该框架整合了主动学习和人类专业知识，以优化性质预测器用于目标导向的分子生成。这种方法对人类反馈中的噪声具有鲁棒性，并确保使用经过人类优化的预测器在化学空间中导航时利用人类洞察力来识别不仅满足预测性质概况而且在神谕模型上得分很高的分子。此外，它优先考虑诸如药物相似性、合成可及性以及在探索多样化学空间和利用与现有训练数据的相似性之间取得良好平衡等实际特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2037/11629536/b0cc14d5dda3/13321_2024_924_Fig1_HTML.jpg

相似文献

Human-in-the-loop active learning for goal-oriented molecule generation.用于目标导向分子生成的人工参与式主动学习

J Cheminform. 2024 Dec 9;16(1):138. doi: 10.1186/s13321-024-00924-y.

Actively Searching: Inverse Design of Novel Molecules with Simultaneously Optimized Properties.主动搜索：同时优化性质的新型分子的逆向设计。

J Phys Chem A. 2022 Jan 20;126(2):333-340. doi: 10.1021/acs.jpca.1c08191. Epub 2022 Jan 5.

Molecular Assays Simulator to Unravel Predictors Hacking in Goal-Directed Molecular Generations.分子分析模拟器，揭示定向分子生成中的黑客预测因子。

J Chem Inf Model. 2023 Jul 10;63(13):3983-3998. doi: 10.1021/acs.jcim.3c00195. Epub 2023 Jun 22.

Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study.深度生成模型中基于结构和配体的评分函数比较：以G蛋白偶联受体为例的研究

J Cheminform. 2021 May 13;13(1):39. doi: 10.1186/s13321-021-00516-0.

Generative artificial intelligence to produce high-fidelity blastocyst-stage embryo images.生成式人工智能生成高保真囊胚期胚胎图像。

Hum Reprod. 2024 Jun 3;39(6):1197-1207. doi: 10.1093/humrep/deae064.

Navigating Transition-Metal Chemical Space: Artificial Intelligence for First-Principles Design.探索过渡金属化学空间：基于第一性原理设计的人工智能

Acc Chem Res. 2021 Feb 2;54(3):532-545. doi: 10.1021/acs.accounts.0c00686. Epub 2021 Jan 22.

FSM-DDTR: End-to-end feedback strategy for multi-objective De Novo drug design using transformers.FSM-DDTR：使用变压器的多目标从头药物设计的端到端反馈策略。

Comput Biol Med. 2023 Sep;164:107285. doi: 10.1016/j.compbiomed.2023.107285. Epub 2023 Jul 31.

V-Dock: Fast Generation of Novel Drug-like Molecules Using Machine-Learning-Based Docking Score and Molecular Optimization.V-Dock：基于机器学习的对接评分和分子优化快速生成新型类药物分子。

Int J Mol Sci. 2021 Oct 27;22(21):11635. doi: 10.3390/ijms222111635.

Development of scoring-assisted generative exploration (SAGE) and its application to dual inhibitor design for acetylcholinesterase and monoamine oxidase B.评分辅助生成性探索（SAGE）的开发及其在乙酰胆碱酯酶和单胺氧化酶B双重抑制剂设计中的应用。

J Cheminform. 2024 May 24;16(1):59. doi: 10.1186/s13321-024-00845-w.

Evaluation of reinforcement learning in transformer-based molecular design.基于Transformer的分子设计中强化学习的评估

J Cheminform. 2024 Aug 8;16(1):95. doi: 10.1186/s13321-024-00887-0.

引用本文的文献

Advanced machine learning for innovative drug discovery.用于创新药物发现的先进机器学习技术。

J Cheminform. 2025 Aug 8;17(1):122. doi: 10.1186/s13321-025-01061-w.

Optimizing drug design by merging generative AI with a physics-based active learning framework.通过将生成式人工智能与基于物理学的主动学习框架相结合来优化药物设计。

Commun Chem. 2025 Aug 8;8(1):238. doi: 10.1038/s42004-025-01635-7.

Bridging chemical space and biological efficacy: advances and challenges in applying generative models in structural modification of natural products.连接化学空间与生物活性：生成模型在天然产物结构修饰中的应用进展与挑战

Nat Prod Bioprospect. 2025 Jun 6;15(1):37. doi: 10.1007/s13659-025-00521-y.

E-GuARD: expert-guided augmentation for the robust detection of compounds interfering with biological assays.E-GuARD：用于可靠检测干扰生物测定的化合物的专家指导增强方法

J Cheminform. 2025 Apr 29;17(1):64. doi: 10.1186/s13321-025-01014-3.

Molecular property prediction using pretrained-BERT and Bayesian active learning: a data-efficient approach to drug design.使用预训练的BERT和贝叶斯主动学习进行分子性质预测：一种数据高效的药物设计方法。

J Cheminform. 2025 Apr 23;17(1):58. doi: 10.1186/s13321-025-00986-6.

本文引用的文献

Metis: a python-based user interface to collect expert feedback for generative chemistry models.梅蒂斯：一个基于Python的用户界面，用于收集生成化学模型的专家反馈。

J Cheminform. 2024 Aug 14;16(1):100. doi: 10.1186/s13321-024-00892-3.

Integration of persistent Laplacian and pre-trained transformer for protein solubility changes upon mutation.用于突变后蛋白质溶解度变化的持久拉普拉斯算子与预训练变压器的集成

ArXiv. 2023 Nov 2:arXiv:2310.18760v2.

The Role of AI in Drug Discovery: Challenges, Opportunities, and Strategies.人工智能在药物研发中的作用：挑战、机遇与策略。

Pharmaceuticals (Basel). 2023 Jun 18;16(6):891. doi: 10.3390/ph16060891.

Molecular Assays Simulator to Unravel Predictors Hacking in Goal-Directed Molecular Generations.分子分析模拟器，揭示定向分子生成中的黑客预测因子。

J Chem Inf Model. 2023 Jul 10;63(13):3983-3998. doi: 10.1021/acs.jcim.3c00195. Epub 2023 Jun 22.

Computer-aided multi-objective optimization in small molecule discovery.小分子发现中的计算机辅助多目标优化

Patterns (N Y). 2023 Feb 10;4(2):100678. doi: 10.1016/j.patter.2023.100678.

Human-in-the-loop assisted de novo molecular design.人在回路辅助的从头分子设计

J Cheminform. 2022 Dec 28;14(1):86. doi: 10.1186/s13321-022-00667-8.

Explaining and avoiding failure modes in goal-directed generation of small molecules.解释并避免小分子目标导向生成中的失败模式。

J Cheminform. 2022 Apr 1;14(1):20. doi: 10.1186/s13321-022-00601-y.

Generative machine learning for de novo drug discovery: A systematic review.生成式机器学习在从头药物发现中的应用：系统评价。

Comput Biol Med. 2022 Jun;145:105403. doi: 10.1016/j.compbiomed.2022.105403. Epub 2022 Mar 13.

Artificial intelligence in drug discovery: recent advances and future perspectives.药物研发中的人工智能：最新进展与未来展望。

Expert Opin Drug Discov. 2021 Sep;16(9):949-959. doi: 10.1080/17460441.2021.1909567. Epub 2021 Apr 2.

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models.分子集（MOSES）：分子生成模型的基准测试平台。

Front Pharmacol. 2020 Dec 18;11:565644. doi: 10.3389/fphar.2020.565644. eCollection 2020.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于目标导向分子生成的人工参与式主动学习

Human-in-the-loop active learning for goal-oriented molecule generation.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献