使用在大规模机理数据集上训练的机器学习模型再现反应机理

Reproducing Reaction Mechanisms with Machine-Learning Models Trained on a Large-Scale Mechanistic Dataset.

作者信息

Joung Joonyoung F, Fong Mun Hong, Roh Jihye, Tu Zhengkai, Bradshaw John, Coley Connor W

机构信息

Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, 02139, United States.

Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, 02139, United States.

出版信息

Angew Chem Int Ed Engl. 2024 Oct 21;63(43):e202411296. doi: 10.1002/anie.202411296. Epub 2024 Sep 2.

DOI:10.1002/anie.202411296

PMID:38995205

Abstract

Mechanistic understanding of organic reactions can facilitate reaction development, impurity prediction, and in principle, reaction discovery. While several machine learning models have sought to address the task of predicting reaction products, their extension to predicting reaction mechanisms has been impeded by the lack of a corresponding mechanistic dataset. In this study, we construct such a dataset by imputing intermediates between experimentally reported reactants and products using expert reaction templates and train several machine learning models on the resulting dataset of 5,184,184 elementary steps. We explore the performance and capabilities of these models, focusing on their ability to predict reaction pathways and recapitulate the roles of catalysts and reagents. Additionally, we demonstrate the potential of mechanistic models in predicting impurities, often overlooked by conventional models. We conclude by evaluating the generalizability of mechanistic models to new reaction types, revealing challenges related to dataset diversity, consecutive predictions, and violations of atom conservation.

摘要

对有机反应的机理理解有助于反应开发、杂质预测，原则上还能促进反应发现。虽然已有多个机器学习模型致力于解决预测反应产物的任务，但由于缺乏相应的机理数据集，它们在预测反应机理方面的扩展受到了阻碍。在本研究中，我们通过使用专家反应模板在实验报告的反应物和产物之间插入中间体来构建这样一个数据集，并在由此得到的包含5,184,184个基本步骤的数据集上训练了多个机器学习模型。我们探索了这些模型的性能和能力，重点关注它们预测反应途径以及概括催化剂和试剂作用的能力。此外，我们展示了机理模型在预测杂质方面的潜力，而这往往被传统模型所忽视。我们通过评估机理模型对新反应类型的通用性来得出结论，揭示了与数据集多样性、连续预测以及违反原子守恒相关的挑战。

相似文献

Reproducing Reaction Mechanisms with Machine-Learning Models Trained on a Large-Scale Mechanistic Dataset.使用在大规模机理数据集上训练的机器学习模型再现反应机理

Angew Chem Int Ed Engl. 2024 Oct 21;63(43):e202411296. doi: 10.1002/anie.202411296. Epub 2024 Sep 2.

A large-scale reaction dataset of mechanistic pathways of organic reactions.一个有机反应机理途径的大规模反应数据集。

Sci Data. 2024 Aug 10;11(1):863. doi: 10.1038/s41597-024-03709-y.

Learning to predict chemical reactions.学习预测化学反应。

J Chem Inf Model. 2011 Sep 26;51(9):2209-22. doi: 10.1021/ci200207y. Epub 2011 Sep 2.

ReactionPredictor: prediction of complex chemical reactions at the mechanistic level using machine learning.ReactionPredictor：使用机器学习在机理水平上预测复杂化学反应。

J Chem Inf Model. 2012 Oct 22;52(10):2526-40. doi: 10.1021/ci3003039. Epub 2012 Oct 1.

Transfer Learning: Making Retrosynthetic Predictions Based on a Small Chemical Reaction Dataset Scale to a New Level.迁移学习：基于小规模化学反应数据集的逆向合成预测扩展到新的水平。

Molecules. 2020 May 19;25(10):2357. doi: 10.3390/molecules25102357.

AutoTemplate: enhancing chemical reaction datasets for machine learning applications in organic chemistry.自动模板：增强用于有机化学机器学习应用的化学反应数据集。

J Cheminform. 2024 Jun 27;16(1):74. doi: 10.1186/s13321-024-00869-2.

Ensemble machine learning model trained on a new synthesized dataset generalizes well for stress prediction using wearable devices.在新合成数据集上训练的集成机器学习模型，对于使用可穿戴设备进行压力预测具有良好的泛化能力。

J Biomed Inform. 2023 Dec;148:104556. doi: 10.1016/j.jbi.2023.104556. Epub 2023 Dec 2.

Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies.机器学习与机理建模相结合以精确预测实验活化能。

Chem Sci. 2020 Nov 5;12(3):1163-1175. doi: 10.1039/d0sc04896h. eCollection 2021 Jan 21.

Deductive machine learning models for product identification.用于产品识别的演绎机器学习模型。

Chem Sci. 2024 Jul 1;15(30):11995-12005. doi: 10.1039/d3sc04909d. eCollection 2024 Jul 31.

Precise atom-to-atom mapping for organic reactions via human-in-the-loop machine learning.通过人在回路机器学习实现有机反应的精确原子对原子映射。

Nat Commun. 2024 Mar 13;15(1):2250. doi: 10.1038/s41467-024-46364-y.

引用本文的文献

Cross-disciplinary perspectives on the potential for artificial intelligence across chemistry.关于人工智能在化学领域潜力的跨学科观点。

Chem Soc Rev. 2025 Apr 25. doi: 10.1039/d5cs00146c.

Atom-based machine learning for estimating nucleophilicity and electrophilicity with applications to retrosynthesis and chemical stability.基于原子的机器学习用于估计亲核性和亲电性及其在逆合成和化学稳定性方面的应用

Chem Sci. 2025 Feb 25;16(13):5676-5687. doi: 10.1039/d4sc07297a. eCollection 2025 Mar 26.

Chemically Informed Deep Learning for Interpretable Radical Reaction Prediction.用于可解释自由基反应预测的化学信息深度学习

J Chem Inf Model. 2025 Feb 10;65(3):1228-1242. doi: 10.1021/acs.jcim.4c01901. Epub 2025 Jan 28.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用在大规模机理数据集上训练的机器学习模型再现反应机理

Reproducing Reaction Mechanisms with Machine-Learning Models Trained on a Large-Scale Mechanistic Dataset.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献