Joung Joonyoung F, Fong Mun Hong, Roh Jihye, Tu Zhengkai, Bradshaw John, Coley Connor W
Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, 02139, United States.
Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, 02139, United States.
Angew Chem Int Ed Engl. 2024 Oct 21;63(43):e202411296. doi: 10.1002/anie.202411296. Epub 2024 Sep 2.
Mechanistic understanding of organic reactions can facilitate reaction development, impurity prediction, and in principle, reaction discovery. While several machine learning models have sought to address the task of predicting reaction products, their extension to predicting reaction mechanisms has been impeded by the lack of a corresponding mechanistic dataset. In this study, we construct such a dataset by imputing intermediates between experimentally reported reactants and products using expert reaction templates and train several machine learning models on the resulting dataset of 5,184,184 elementary steps. We explore the performance and capabilities of these models, focusing on their ability to predict reaction pathways and recapitulate the roles of catalysts and reagents. Additionally, we demonstrate the potential of mechanistic models in predicting impurities, often overlooked by conventional models. We conclude by evaluating the generalizability of mechanistic models to new reaction types, revealing challenges related to dataset diversity, consecutive predictions, and violations of atom conservation.
对有机反应的机理理解有助于反应开发、杂质预测,原则上还能促进反应发现。虽然已有多个机器学习模型致力于解决预测反应产物的任务,但由于缺乏相应的机理数据集,它们在预测反应机理方面的扩展受到了阻碍。在本研究中,我们通过使用专家反应模板在实验报告的反应物和产物之间插入中间体来构建这样一个数据集,并在由此得到的包含5,184,184个基本步骤的数据集上训练了多个机器学习模型。我们探索了这些模型的性能和能力,重点关注它们预测反应途径以及概括催化剂和试剂作用的能力。此外,我们展示了机理模型在预测杂质方面的潜力,而这往往被传统模型所忽视。我们通过评估机理模型对新反应类型的通用性来得出结论,揭示了与数据集多样性、连续预测以及违反原子守恒相关的挑战。