Tu Zhengkai, Choure Sourabh J, Fong Mun Hong, Roh Jihye, Levin Itai, Yu Kevin, Joung Joonyoung F, Morgan Nathan, Li Shih-Cheng, Sun Xiaoqi, Lin Huiqian, Murnin Mark, Liles Jordan P, Struble Thomas J, Fortunato Michael E, Liu Mengjie, Green William H, Jensen Klavs F, Coley Connor W
Department of Electrical Engineering and Computer Science, MIT, Cambridge, Massachusetts 02139, United States.
Department of Chemical Engineering, MIT, Cambridge, Massachusetts 02139, United States.
Acc Chem Res. 2025 Jun 3;58(11):1764-1775. doi: 10.1021/acs.accounts.5c00155. Epub 2025 May 21.
ConspectusThe advancement of machine learning and the availability of large-scale reaction datasets have accelerated the development of data-driven models for computer-aided synthesis planning (CASP) in the past decade. In this Account, we describe the range of data-driven methods and models that have been incorporated into the newest version of ASKCOS, an open-source software suite for synthesis planning that we have been developing since 2016. This ongoing effort has been driven by the importance of bridging the gap between research and development, making research advances available through a freely available practical tool. ASKCOS integrates modules for retrosynthetic planning, modules for complementary capabilities of condition prediction and reaction product prediction, and several supplementary modules and utilities with various roles in synthesis planning. For retrosynthetic planning, we have developed an Interactive Path Planner (IPP) for user-guided search as well as a Tree Builder for automatic planning with two well-known tree search algorithms, Monte Carlo Tree Search (MCTS) and Retro*. Four one-step retrosynthesis models covering template-based and template-free strategies form the basis of retrosynthetic predictions and can be used simultaneously to combine their advantages and propose diverse suggestions. Strategies for assessing the feasibility of proposed reaction steps and evaluating the full pathways are built on top of several pioneering efforts that we have made in the subtasks of reaction condition recommendation, pathway scoring and clustering, and the prediction of reaction outcomes including the major product, impurities, site selectivity, and regioselectivity. In addition, we have also developed auxiliary capabilities in ASKCOS based on our past and ongoing work for solubility prediction and quantum mechanical descriptor prediction, which can provide more insight into the suitability of proposed reaction solvents or the hypothetical selectivity of desired transformations. For each of these capabilities, we highlight its relevance in the context of synthesis planning and present a comprehensive overview of how it is built on top of not only our work but also of other recent advancements in the field. We also describe in detail how chemists can easily interact with these capabilities via user-friendly interfaces. ASKCOS has assisted hundreds of medicinal, synthetic, and process chemists in their day-to-day tasks by complementing expert decision making and route ideation. It is our belief that CASP tools are an important part of modern chemistry research and offer ever-increasing utility and accessibility.
概述
在过去十年中,机器学习的进步以及大规模反应数据集的可用性加速了用于计算机辅助合成规划(CASP)的数据驱动模型的发展。在本综述中,我们描述了已被纳入ASKCOS最新版本的数据驱动方法和模型,ASKCOS是我们自2016年以来一直在开发的用于合成规划的开源软件套件。这项持续的工作是由弥合研发差距的重要性驱动的,通过一个免费可用的实用工具使研究进展得以应用。ASKCOS集成了用于逆合成规划的模块、用于条件预测和反应产物预测等互补功能的模块,以及在合成规划中具有各种作用的几个辅助模块和实用工具。对于逆合成规划,我们开发了用于用户引导搜索的交互式路径规划器(IPP)以及用于自动规划的树构建器,它采用两种著名的树搜索算法,即蒙特卡洛树搜索(MCTS)和Retro*。四个涵盖基于模板和无模板策略的一步逆合成模型构成了逆合成预测的基础,并且可以同时使用以结合它们的优势并提出多样化的建议。评估所提议反应步骤的可行性和评估完整路径的策略是建立在我们在反应条件推荐、路径评分和聚类以及反应结果预测(包括主要产物、杂质、位点选择性和区域选择性)等子任务中所做的几项开创性工作之上的。此外,基于我们过去和正在进行的关于溶解度预测和量子力学描述符预测的工作,我们还在ASKCOS中开发了辅助功能,这可以更深入地了解所提议反应溶剂的适用性或所需转化的假设选择性。对于这些功能中的每一个,我们强调其在合成规划背景下的相关性,并全面概述它是如何不仅基于我们的工作,而且还基于该领域的其他最新进展构建的。我们还详细描述了化学家如何通过用户友好的界面轻松地与这些功能进行交互。ASKCOS通过补充专家决策和路线构思,在日常任务中协助了数百名药物、合成和工艺化学家。我们相信,CASP工具是现代化学研究的重要组成部分,并且具有越来越高的实用性和可及性。