使用混合机理-机器学习方法完成并平衡数据库摘录的化学反应。

Completing and Balancing Database Excerpted Chemical Reactions with a Hybrid Mechanistic-Machine Learning Approach.

作者信息

Zhang Chonghuan, Arun Adarsh, Lapkin Alexei A

机构信息

Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.

Cambridge Centre for Advanced Research and Education in Singapore, CARES Ltd., 1 CREATE Way, CREATE Tower #05-05, Singapore 138602 Singapore.

出版信息

ACS Omega. 2024 Apr 10;9(16):18385-18399. doi: 10.1021/acsomega.4c00262. eCollection 2024 Apr 23.

DOI:10.1021/acsomega.4c00262

PMID:38680356

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11044172/

Abstract

Computer-aided synthesis planning (CASP) development of reaction routes requires an understanding of complete reaction structures. However, most reactions in the current databases are missing reaction coparticipants. Although reaction prediction and atom mapping tools can predict major reaction participants and trace atom rearrangements in reactions, they fail to identify the missing molecules to complete reactions. This is because these approaches are data-driven models trained on the current reaction databases, which comprise incomplete reactions. In this work, a workflow was developed to tackle the reaction completion challenge. This includes a heuristic-based method to identify balanced reactions from reaction databases and complete some imbalanced reactions by adding candidate molecules. A machine learning masked language model (MLM) was trained to learn from simplified molecular input line entry system (SMILES) sentences of these completed reactions. The model predicted missing molecules for the incomplete reactions, a workflow analogous to predicting missing words in sentences. The model is promising for the prediction of small- and middle-sized missing molecules in incomplete reaction records. The workflow combining both the heuristic and machine learning methods completed more than half of the entire reaction space.

摘要

计算机辅助合成路线规划（CASP）中反应路线的开发需要对完整的反应结构有所了解。然而，当前数据库中的大多数反应都缺少反应共同参与者。尽管反应预测和原子映射工具可以预测反应中的主要反应参与者并追踪原子重排，但它们无法识别完成反应所需的缺失分子。这是因为这些方法是基于当前反应数据库训练的数据驱动模型，而这些数据库中的反应是不完整的。在这项工作中，开发了一种工作流程来应对反应完成的挑战。这包括一种基于启发式的方法，用于从反应数据库中识别平衡反应，并通过添加候选分子来完成一些不平衡反应。训练了一个机器学习掩码语言模型（MLM），以从这些完成反应的简化分子输入线性输入系统（SMILES）句子中学习。该模型预测不完整反应中缺失的分子，这一工作流程类似于预测句子中缺失的单词。该模型在预测不完整反应记录中的中小型缺失分子方面很有前景。结合启发式和机器学习方法的工作流程完成了超过一半的整个反应空间。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a5f/11044172/a03eda71874c/ao4c00262_0001.jpg

相似文献

Completing and Balancing Database Excerpted Chemical Reactions with a Hybrid Mechanistic-Machine Learning Approach.使用混合机理-机器学习方法完成并平衡数据库摘录的化学反应。

ACS Omega. 2024 Apr 10;9(16):18385-18399. doi: 10.1021/acsomega.4c00262. eCollection 2024 Apr 23.

Machine Learning in Computer-Aided Synthesis Planning.计算机辅助合成规划中的机器学习

Acc Chem Res. 2018 May 15;51(5):1281-1289. doi: 10.1021/acs.accounts.8b00087. Epub 2018 May 1.

Reaction rebalancing: a novel approach to curating reaction databases.反应再平衡：一种整理反应数据库的新方法。

J Cheminform. 2024 Jul 19;16(1):82. doi: 10.1186/s13321-024-00875-4.

AutoTemplate: enhancing chemical reaction datasets for machine learning applications in organic chemistry.自动模板：增强用于有机化学机器学习应用的化学反应数据集。

J Cheminform. 2024 Jun 27;16(1):74. doi: 10.1186/s13321-024-00869-2.

Information Adapted Machine Learning Models for Prediction in Clinical Workflow.用于临床工作流程预测的信息自适应机器学习模型

Stud Health Technol Inform. 2019;260:65-72.

Precise atom-to-atom mapping for organic reactions via human-in-the-loop machine learning.通过人在回路机器学习实现有机反应的精确原子对原子映射。

Nat Commun. 2024 Mar 13;15(1):2250. doi: 10.1038/s41467-024-46364-y.

Molecular Machine Learning for Chemical Catalysis: Prospects and Challenges.分子机器学习在化学催化中的应用：前景与挑战。

Acc Chem Res. 2023 Feb 7;56(3):402-412. doi: 10.1021/acs.accounts.2c00801. Epub 2023 Jan 30.

Advancing molecular graphs with descriptors for the prediction of chemical reaction yields.推进分子图及其描述符以预测化学反应产率。

J Comput Chem. 2023 Jan 15;44(2):76-92. doi: 10.1002/jcc.27016. Epub 2022 Oct 20.

Extraction of organic chemistry grammar from unsupervised learning of chemical reactions.从化学反应的无监督学习中提取有机化学语法

Sci Adv. 2021 Apr 7;7(15). doi: 10.1126/sciadv.abe4166. Print 2021 Apr.

Machine-Learning-Guided Discovery of Electrochemical Reactions.机器学习指导下的电化学反应发现。

J Am Chem Soc. 2022 Dec 14;144(49):22599-22610. doi: 10.1021/jacs.2c08997. Epub 2022 Dec 2.

引用本文的文献

Machine learning-guided strategies for reaction conditions design and optimization.用于反应条件设计与优化的机器学习引导策略。

Beilstein J Org Chem. 2024 Oct 4;20:2476-2492. doi: 10.3762/bjoc.20.212. eCollection 2024.

Reaction rebalancing: a novel approach to curating reaction databases.反应再平衡：一种整理反应数据库的新方法。

J Cheminform. 2024 Jul 19;16(1):82. doi: 10.1186/s13321-024-00875-4.

本文引用的文献

The Open Reaction Database.开放式反应数据库。

J Am Chem Soc. 2021 Nov 17;143(45):18820-18826. doi: 10.1021/jacs.1c09820. Epub 2021 Nov 2.

Chemical data intelligence for sustainable chemistry.化学数据智能助力可持续化学。

Chem Soc Rev. 2021 Nov 1;50(21):12013-12036. doi: 10.1039/d1cs00477h.

Extraction of organic chemistry grammar from unsupervised learning of chemical reactions.从化学反应的无监督学习中提取有机化学语法

Sci Adv. 2021 Apr 7;7(15). doi: 10.1126/sciadv.abe4166. Print 2021 Apr.

Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain.数据集及其对制药领域计算机辅助合成规划工具发展的影响。

Chem Sci. 2019 Nov 5;11(1):154-168. doi: 10.1039/c9sc04944d. eCollection 2020 Jan 7.

Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction.分子变压器：一种用于不确定性校准化学反应预测的模型。

ACS Cent Sci. 2019 Sep 25;5(9):1572-1583. doi: 10.1021/acscentsci.9b00576. Epub 2019 Aug 30.

CGRtools: Python Library for Molecule, Reaction, and Condensed Graph of Reaction Processing.CGRtools：用于分子、反应和反应处理凝聚图的 Python 库。

J Chem Inf Model. 2019 Jun 24;59(6):2516-2521. doi: 10.1021/acs.jcim.9b00102. Epub 2019 May 28.

Automatic mapping of atoms across both simple and complex chemical reactions.自动映射简单和复杂化学反应中的原子。

Nat Commun. 2019 Mar 29;10(1):1434. doi: 10.1038/s41467-019-09440-2.

Prediction of Organic Reaction Outcomes Using Machine Learning.使用机器学习预测有机反应结果

ACS Cent Sci. 2017 May 24;3(5):434-443. doi: 10.1021/acscentsci.7b00064. Epub 2017 Apr 18.

Computer-Assisted Synthetic Planning: The End of the Beginning.计算机辅助综合规划：开端的终结。

Angew Chem Int Ed Engl. 2016 May 10;55(20):5904-37. doi: 10.1002/anie.201506101. Epub 2016 Apr 8.

Endo- and exocytosis of zwitterionic quantum dot nanoparticles by live HeLa cells.活 HeLa 细胞内两性离子量子点纳米颗粒的内吞作用和胞吐作用。

ACS Nano. 2010 Nov 23;4(11):6787-97. doi: 10.1021/nn101277w. Epub 2010 Oct 28.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用混合机理-机器学习方法完成并平衡数据库摘录的化学反应。

Completing and Balancing Database Excerpted Chemical Reactions with a Hybrid Mechanistic-Machine Learning Approach.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献