Suppr超能文献

使用在大规模机理数据集上训练的机器学习模型再现反应机理

Reproducing Reaction Mechanisms with Machine-Learning Models Trained on a Large-Scale Mechanistic Dataset.

作者信息

Joung Joonyoung F, Fong Mun Hong, Roh Jihye, Tu Zhengkai, Bradshaw John, Coley Connor W

机构信息

Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, 02139, United States.

Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, 02139, United States.

出版信息

Angew Chem Int Ed Engl. 2024 Oct 21;63(43):e202411296. doi: 10.1002/anie.202411296. Epub 2024 Sep 2.

Abstract

Mechanistic understanding of organic reactions can facilitate reaction development, impurity prediction, and in principle, reaction discovery. While several machine learning models have sought to address the task of predicting reaction products, their extension to predicting reaction mechanisms has been impeded by the lack of a corresponding mechanistic dataset. In this study, we construct such a dataset by imputing intermediates between experimentally reported reactants and products using expert reaction templates and train several machine learning models on the resulting dataset of 5,184,184 elementary steps. We explore the performance and capabilities of these models, focusing on their ability to predict reaction pathways and recapitulate the roles of catalysts and reagents. Additionally, we demonstrate the potential of mechanistic models in predicting impurities, often overlooked by conventional models. We conclude by evaluating the generalizability of mechanistic models to new reaction types, revealing challenges related to dataset diversity, consecutive predictions, and violations of atom conservation.

摘要

对有机反应的机理理解有助于反应开发、杂质预测,原则上还能促进反应发现。虽然已有多个机器学习模型致力于解决预测反应产物的任务,但由于缺乏相应的机理数据集,它们在预测反应机理方面的扩展受到了阻碍。在本研究中,我们通过使用专家反应模板在实验报告的反应物和产物之间插入中间体来构建这样一个数据集,并在由此得到的包含5,184,184个基本步骤的数据集上训练了多个机器学习模型。我们探索了这些模型的性能和能力,重点关注它们预测反应途径以及概括催化剂和试剂作用的能力。此外,我们展示了机理模型在预测杂质方面的潜力,而这往往被传统模型所忽视。我们通过评估机理模型对新反应类型的通用性来得出结论,揭示了与数据集多样性、连续预测以及违反原子守恒相关的挑战。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验