Suppr超能文献

使用混合机理-机器学习方法完成并平衡数据库摘录的化学反应。

Completing and Balancing Database Excerpted Chemical Reactions with a Hybrid Mechanistic-Machine Learning Approach.

作者信息

Zhang Chonghuan, Arun Adarsh, Lapkin Alexei A

机构信息

Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.

Cambridge Centre for Advanced Research and Education in Singapore, CARES Ltd., 1 CREATE Way, CREATE Tower #05-05, Singapore 138602 Singapore.

出版信息

ACS Omega. 2024 Apr 10;9(16):18385-18399. doi: 10.1021/acsomega.4c00262. eCollection 2024 Apr 23.

Abstract

Computer-aided synthesis planning (CASP) development of reaction routes requires an understanding of complete reaction structures. However, most reactions in the current databases are missing reaction coparticipants. Although reaction prediction and atom mapping tools can predict major reaction participants and trace atom rearrangements in reactions, they fail to identify the missing molecules to complete reactions. This is because these approaches are data-driven models trained on the current reaction databases, which comprise incomplete reactions. In this work, a workflow was developed to tackle the reaction completion challenge. This includes a heuristic-based method to identify balanced reactions from reaction databases and complete some imbalanced reactions by adding candidate molecules. A machine learning masked language model (MLM) was trained to learn from simplified molecular input line entry system (SMILES) sentences of these completed reactions. The model predicted missing molecules for the incomplete reactions, a workflow analogous to predicting missing words in sentences. The model is promising for the prediction of small- and middle-sized missing molecules in incomplete reaction records. The workflow combining both the heuristic and machine learning methods completed more than half of the entire reaction space.

摘要

计算机辅助合成路线规划(CASP)中反应路线的开发需要对完整的反应结构有所了解。然而,当前数据库中的大多数反应都缺少反应共同参与者。尽管反应预测和原子映射工具可以预测反应中的主要反应参与者并追踪原子重排,但它们无法识别完成反应所需的缺失分子。这是因为这些方法是基于当前反应数据库训练的数据驱动模型,而这些数据库中的反应是不完整的。在这项工作中,开发了一种工作流程来应对反应完成的挑战。这包括一种基于启发式的方法,用于从反应数据库中识别平衡反应,并通过添加候选分子来完成一些不平衡反应。训练了一个机器学习掩码语言模型(MLM),以从这些完成反应的简化分子输入线性输入系统(SMILES)句子中学习。该模型预测不完整反应中缺失的分子,这一工作流程类似于预测句子中缺失的单词。该模型在预测不完整反应记录中的中小型缺失分子方面很有前景。结合启发式和机器学习方法的工作流程完成了超过一半的整个反应空间。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a5f/11044172/a03eda71874c/ao4c00262_0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验