Urbina Fabio, Lowden Christopher T, Culberson J Christopher, Ekins Sean
Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States.
Workflow Informatics Corporation, 9316 Bramden Court, Wake Forest, North Carolina 27587, United States.
ACS Omega. 2022 May 27;7(22):18699-18713. doi: 10.1021/acsomega.2c01404. eCollection 2022 Jun 7.
Generative machine learning models have become widely adopted in drug discovery and other fields to produce new molecules and explore molecular space, with the goal of discovering novel compounds with optimized properties. These generative models are frequently combined with transfer learning or scoring of the physicochemical properties to steer generative design, yet often, they are not capable of addressing a wide variety of potential problems, as well as converge into similar molecular space when combined with a scoring function for the desired properties. In addition, these generated compounds may not be synthetically feasible, reducing their capabilities and limiting their usefulness in real-world scenarios. Here, we introduce a suite of automated tools called MegaSyn representing three components: a new hill-climb algorithm, which makes use of SMILES-based recurrent neural network (RNN) generative models, analog generation software, and retrosynthetic analysis coupled with fragment analysis to score molecules for their synthetic feasibility. We show that by deconstructing the targeted molecules and focusing on substructures, combined with an ensemble of generative models, MegaSyn generally performs well for the specific tasks of generating new scaffolds as well as targeted analogs, which are likely synthesizable and druglike. We now describe the development, benchmarking, and testing of this suite of tools and propose how they might be used to optimize molecules or prioritize promising lead compounds using these RNN examples provided by multiple test case examples.
生成式机器学习模型已在药物发现和其他领域中广泛应用,用于生成新分子并探索分子空间,目标是发现具有优化性质的新型化合物。这些生成式模型经常与迁移学习或物理化学性质评分相结合,以指导生成式设计,但它们往往无法解决各种各样的潜在问题,并且在与所需性质的评分函数结合时会收敛到相似的分子空间。此外,这些生成的化合物可能在合成上不可行,从而降低了它们的能力,并限制了它们在实际场景中的实用性。在此,我们介绍了一套名为MegaSyn的自动化工具,它由三个部分组成:一种新的爬山算法,该算法利用基于SMILES的递归神经网络(RNN)生成模型、类似物生成软件,以及结合片段分析的逆合成分析,以评估分子的合成可行性。我们表明,通过解构目标分子并关注子结构,结合生成模型的集成,MegaSyn通常在生成新支架以及目标类似物的特定任务中表现良好,这些新支架和目标类似物可能是可合成的且具有药物特性。我们现在描述这套工具的开发、基准测试和测试,并提出如何使用多个测试用例提供的这些RNN示例来优化分子或对有前景的先导化合物进行优先级排序。