MegaSyn：整合生成性分子设计、自动化类似物设计和合成可行性预测

MegaSyn: Integrating Generative Molecular Design, Automated Analog Designer, and Synthetic Viability Prediction.

作者信息

Urbina Fabio, Lowden Christopher T, Culberson J Christopher, Ekins Sean

机构信息

Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States.

Workflow Informatics Corporation, 9316 Bramden Court, Wake Forest, North Carolina 27587, United States.

出版信息

ACS Omega. 2022 May 27;7(22):18699-18713. doi: 10.1021/acsomega.2c01404. eCollection 2022 Jun 7.

DOI:10.1021/acsomega.2c01404

PMID:35694522

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9178760/

Abstract

Generative machine learning models have become widely adopted in drug discovery and other fields to produce new molecules and explore molecular space, with the goal of discovering novel compounds with optimized properties. These generative models are frequently combined with transfer learning or scoring of the physicochemical properties to steer generative design, yet often, they are not capable of addressing a wide variety of potential problems, as well as converge into similar molecular space when combined with a scoring function for the desired properties. In addition, these generated compounds may not be synthetically feasible, reducing their capabilities and limiting their usefulness in real-world scenarios. Here, we introduce a suite of automated tools called MegaSyn representing three components: a new hill-climb algorithm, which makes use of SMILES-based recurrent neural network (RNN) generative models, analog generation software, and retrosynthetic analysis coupled with fragment analysis to score molecules for their synthetic feasibility. We show that by deconstructing the targeted molecules and focusing on substructures, combined with an ensemble of generative models, MegaSyn generally performs well for the specific tasks of generating new scaffolds as well as targeted analogs, which are likely synthesizable and druglike. We now describe the development, benchmarking, and testing of this suite of tools and propose how they might be used to optimize molecules or prioritize promising lead compounds using these RNN examples provided by multiple test case examples.

摘要

生成式机器学习模型已在药物发现和其他领域中广泛应用，用于生成新分子并探索分子空间，目标是发现具有优化性质的新型化合物。这些生成式模型经常与迁移学习或物理化学性质评分相结合，以指导生成式设计，但它们往往无法解决各种各样的潜在问题，并且在与所需性质的评分函数结合时会收敛到相似的分子空间。此外，这些生成的化合物可能在合成上不可行，从而降低了它们的能力，并限制了它们在实际场景中的实用性。在此，我们介绍了一套名为MegaSyn的自动化工具，它由三个部分组成：一种新的爬山算法，该算法利用基于SMILES的递归神经网络（RNN）生成模型、类似物生成软件，以及结合片段分析的逆合成分析，以评估分子的合成可行性。我们表明，通过解构目标分子并关注子结构，结合生成模型的集成，MegaSyn通常在生成新支架以及目标类似物的特定任务中表现良好，这些新支架和目标类似物可能是可合成的且具有药物特性。我们现在描述这套工具的开发、基准测试和测试，并提出如何使用多个测试用例提供的这些RNN示例来优化分子或对有前景的先导化合物进行优先级排序。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982a/9178760/ec159fffcd9a/ao2c01404_0002.jpg

相似文献

MegaSyn: Integrating Generative Molecular Design, Automated Analog Designer, and Synthetic Viability Prediction.MegaSyn：整合生成性分子设计、自动化类似物设计和合成可行性预测

ACS Omega. 2022 May 27;7(22):18699-18713. doi: 10.1021/acsomega.2c01404. eCollection 2022 Jun 7.

SMILES-based deep generative scaffold decorator for de-novo drug design.用于从头药物设计的基于SMILES的深度生成支架修饰器。

J Cheminform. 2020 May 29;12(1):38. doi: 10.1186/s13321-020-00441-8.

AI-Guided Design of MALDI Matrices: Exploring the Electron Transfer Chemical Space for Mass Spectrometric Analysis of Low-Molecular-Weight Compounds.基质辅助激光解吸电离（MALDI）基质的人工智能引导设计：探索用于低分子量化合物质谱分析的电子转移化学空间

J Am Soc Mass Spectrom. 2024 Dec 4;35(12):2836-2848. doi: 10.1021/jasms.4c00186. Epub 2024 Oct 14.

Training recurrent neural networks as generative neural networks for molecular structures: how does it impact drug discovery?将循环神经网络训练为生成式神经网络用于分子结构：它如何影响药物发现？

Expert Opin Drug Discov. 2022 Oct;17(10):1071-1079. doi: 10.1080/17460441.2023.2134340. Epub 2022 Oct 17.

Integrating synthetic accessibility with AI-based generative drug design.将合成可及性与基于人工智能的生成式药物设计相结合。

J Cheminform. 2023 Sep 19;15(1):83. doi: 10.1186/s13321-023-00742-8.

UnCorrupt SMILES: a novel approach to de novo design.未腐败的SMILES：一种全新的从头设计方法。

J Cheminform. 2023 Feb 14;15(1):22. doi: 10.1186/s13321-023-00696-x.

FSM-DDTR: End-to-end feedback strategy for multi-objective De Novo drug design using transformers.FSM-DDTR：使用变压器的多目标从头药物设计的端到端反馈策略。

Comput Biol Med. 2023 Sep;164:107285. doi: 10.1016/j.compbiomed.2023.107285. Epub 2023 Jul 31.

Scaffold-Constrained Molecular Generation.支架约束分子生成。

J Chem Inf Model. 2020 Dec 28;60(12):5637-5646. doi: 10.1021/acs.jcim.0c01015. Epub 2020 Dec 10.

Actively Searching: Inverse Design of Novel Molecules with Simultaneously Optimized Properties.主动搜索：同时优化性质的新型分子的逆向设计。

J Phys Chem A. 2022 Jan 20;126(2):333-340. doi: 10.1021/acs.jpca.1c08191. Epub 2022 Jan 5.

GEN: highly efficient SMILES explorer using autodidactic generative examination networks.GEN：使用自学习生成式检查网络的高效SMILES资源探索器。

J Cheminform. 2020 Apr 10;12(1):22. doi: 10.1186/s13321-020-00425-8.

引用本文的文献

Methods in quantitative biology-from analysis of single-cell microscopy images to inference of predictive models for stochastic gene expression.定量生物学方法——从单细胞显微镜图像分析到随机基因表达预测模型的推断

Phys Biol. 2025 Jun 10;22(4):042001. doi: 10.1088/1478-3975/adda85.

Generate what you can make: achieving in-house synthesizability with readily available resources in de novo drug design.利用现有资源实现从头药物设计中的内部合成可行性：生成你所能制备的物质。

J Cheminform. 2025 Mar 28;17(1):41. doi: 10.1186/s13321-024-00910-4.

CardioGenAI: a machine learning-based framework for re-engineering drugs for reduced hERG liability.CardioGenAI：一种基于机器学习的框架，用于重新设计药物以降低hERG风险。

J Cheminform. 2025 Mar 5;17(1):30. doi: 10.1186/s13321-025-00976-8.

Beyond chemical structures: lessons and guiding principles for the next generation of molecular databases.超越化学结构：面向下一代分子数据库的经验教训与指导原则

Chem Sci. 2024 Nov 28;16(3):1002-1016. doi: 10.1039/d4sc04064c. eCollection 2025 Jan 15.

A comprehensive review of artificial intelligence for pharmacology research.药理学研究中人工智能的全面综述。

Front Genet. 2024 Sep 3;15:1450529. doi: 10.3389/fgene.2024.1450529. eCollection 2024.

Predicting the Hallucinogenic Potential of Molecules Using Artificial Intelligence.利用人工智能预测分子的致幻潜力。

ACS Chem Neurosci. 2024 Aug 21;15(16):3078-3089. doi: 10.1021/acschemneuro.4c00405. Epub 2024 Aug 2.

Llamol: a dynamic multi-conditional generative transformer for de novo molecular design.Llamol：一种用于从头分子设计的动态多条件生成式变换器。

J Cheminform. 2024 Jun 21;16(1):73. doi: 10.1186/s13321-024-00863-8.

5-chloro-3-(2-(2,4-dinitrophenyl) hydrazono)indolin-2-one: synthesis, characterization, biochemical and computational screening against SARS-CoV-2.5-氯-3-(2-(2,4-二硝基苯基)腙)吲哚啉-2-酮：合成、表征、针对严重急性呼吸综合征冠状病毒2的生化及计算筛选

Chem Zvesti. 2024;78(6):3431-3441. doi: 10.1007/s11696-023-03274-5. Epub 2024 Mar 14.

Perspective: The Rapidly Expanding Need for Biosecurity by Design.观点：对设计型生物安全的需求迅速增长。

Biodes Res. 2022 May 25;2022:9809058. doi: 10.34133/2022/9809058. eCollection 2022.

ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein-Specific Molecular Generation.化学空间主动学习（ChemSpaceAL）：一种应用于蛋白质特异性分子生成的高效主动学习方法。

ArXiv. 2023 Dec 4:arXiv:2309.05853v2.

本文引用的文献

Dual Use of Artificial Intelligence-powered Drug Discovery.人工智能驱动的药物发现的双重用途。

Nat Mach Intell. 2022 Mar;4(3):189-191. doi: 10.1038/s42256-022-00465-9. Epub 2022 Mar 7.

RENATE: A Pseudo-retrosynthetic Tool for Synthetically Accessible de novo Design.雷纳特：一种用于具有合成可达性的从头设计的伪逆合成工具。

Mol Inform. 2022 Apr;41(4):e2100207. doi: 10.1002/minf.202100207. Epub 2021 Nov 8.

Artificial Intelligence-Enabled De Novo Design of Novel Compounds that Are Synthesizable.人工智能辅助新型可合成化合物的从头设计。

Methods Mol Biol. 2022;2390:409-419. doi: 10.1007/978-1-0716-1787-8_17.

Deep Learning Applied to Ligand-Based De Novo Drug Design.深度学习在配体的从头药物设计中的应用。

Methods Mol Biol. 2022;2390:273-299. doi: 10.1007/978-1-0716-1787-8_12.

Has Artificial Intelligence Impacted Drug Discovery?人工智能是否影响了药物发现？

Methods Mol Biol. 2022;2390:153-176. doi: 10.1007/978-1-0716-1787-8_6.

De novo molecular design and generative models.从头分子设计与生成模型。

Drug Discov Today. 2021 Nov;26(11):2707-2715. doi: 10.1016/j.drudis.2021.05.019. Epub 2021 Jun 1.

Comparing the Pfizer Central Nervous System Multiparameter Optimization Calculator and a BBB Machine Learning Model.比较辉瑞中枢神经系统多参数优化计算器和 BBB 机器学习模型。

ACS Chem Neurosci. 2021 Jun 16;12(12):2247-2253. doi: 10.1021/acschemneuro.1c00265. Epub 2021 May 24.

Developing QSAR Models with Defined Applicability Domains on PPARγ Binding Affinity Using Large Data Sets and Machine Learning Algorithms.利用大数据集和机器学习算法，在明确的适用域内开发针对 PPARγ 结合亲和力的 QSAR 模型。

Environ Sci Technol. 2021 May 18;55(10):6857-6866. doi: 10.1021/acs.est.0c07040. Epub 2021 Apr 29.

Using Bibliometric Analysis and Machine Learning to Identify Compounds Binding to Sialidase-1.利用文献计量分析和机器学习识别与唾液酸酶-1结合的化合物。

ACS Omega. 2021 Jan 20;6(4):3186-3193. doi: 10.1021/acsomega.0c05591. eCollection 2021 Feb 2.

Mol-CycleGAN: a generative model for molecular optimization.Mol-CycleGAN：一种用于分子优化的生成模型。

J Cheminform. 2020 Jan 8;12(1):2. doi: 10.1186/s13321-019-0404-1.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

MegaSyn：整合生成性分子设计、自动化类似物设计和合成可行性预测

MegaSyn: Integrating Generative Molecular Design, Automated Analog Designer, and Synthetic Viability Prediction.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献