Molecular AI, Discovery Sciences, R&D, AstraZeneca Gothenburg, SE-431 83 Mölndal, Sweden.
Data Science and Modelling, Pharmaceutical Sciences, R&D, AstraZeneca Gothenburg, SE-431 83 Mölndal, Sweden.
J Chem Inf Model. 2023 Apr 10;63(7):1841-1846. doi: 10.1021/acs.jcim.2c01486. Epub 2023 Mar 23.
We introduce the AiZynthTrain Python package for training synthesis models in a robust, reproducible, and extensible way. It contains two pipelines that create a template-based one-step retrosynthesis model and a RingBreaker model that can be straightforwardly integrated in retrosynthesis software. We train such models on the publicly available reaction data set from the U.S. Patent and Trademark Office (USPTO), and these are the first retrosynthesis models created in a completely reproducible end-to-end fashion, starting with the original reaction data source and ending with trained machine-learning models. In particular, we show that employing new heuristics implemented in the pipeline greatly improves the ability of the RingBreaker model for disconnecting ring systems. Furthermore, we demonstrate the robustness of the pipeline by training on a more diverse but proprietary data set. We envisage that this framework will be extended with other synthesis models in the future.
我们介绍了 AiZynthTrain Python 包,用于以强大、可重复和可扩展的方式训练合成模型。它包含两个管道,用于创建基于模板的一步逆合成模型和 RingBreaker 模型,可直接集成到逆合成软件中。我们在来自美国专利商标局 (USPTO) 的公开反应数据集上训练这些模型,这些模型是首次以完全可重复的端到端方式创建的逆合成模型,从原始反应数据源开始,以训练后的机器学习模型结束。特别是,我们表明,在管道中采用新的启发式方法极大地提高了 RingBreaker 模型断开环系统的能力。此外,我们通过在更具多样性但专有的数据集上进行训练来证明管道的稳健性。我们预计,未来将在该框架中扩展其他合成模型。