基于化学图的变压器模型用于高通量交叉偶联反应数据集的产率预测

Chemical Graph-Based Transformer Models for Yield Prediction of High-Throughput Cross-Coupling Reaction Datasets.

作者信息

Sato Akinori, Asahara Ryosuke, Miyao Tomoyuki

机构信息

Data Science Center, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan.

Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan.

出版信息

ACS Omega. 2024 Sep 17;9(39):40907-40919. doi: 10.1021/acsomega.4c06113. eCollection 2024 Oct 1.

DOI:10.1021/acsomega.4c06113

PMID:39372005

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11447720/

Abstract

The chemical reaction yield is an important factor to determine the reaction conditions. Recently, many data-driven models for yield prediction using high-throughput experimentation datasets have been reported. In this study, we propose a neural network architecture based on the chemical graphs of the reaction components to predict the reaction yield. The proposed model is the sequential combination of a message-passing neural network and a transformer encoder (). The reaction components are converted to molecular matrices by the first network, followed by the interplay of the reaction components in the second network after adding the embeddings of the compound roles in the chemical reaction. The predictive ability of the proposed models was compared with state-of-the-art yield prediction models using two high-throughput experimental datasets: the Buchwald-Hartwig cross-coupling (BHC) and Suzuki-Miyaura cross-coupling (SMC) reaction datasets. Overall, the models showed high prediction accuracy for the BHC reaction datasets and some of the extrapolation-oriented SMC reaction datasets. These models also performed well when the training dataset size was relatively large. Furthermore, analyzing the poorly predicted reactions for the BHC reaction dataset revealed a limitation of the data-driven yield prediction approach based on the chemical structural similarity.

摘要

化学反应产率是确定反应条件的一个重要因素。最近，已有许多使用高通量实验数据集进行产率预测的数据驱动模型被报道。在本研究中，我们提出了一种基于反应组分化学图的神经网络架构来预测反应产率。所提出的模型是消息传递神经网络和变压器编码器的顺序组合。反应组分首先由第一个网络转换为分子矩阵，在添加化学反应中化合物角色的嵌入后，再由第二个网络对反应组分进行相互作用。使用两个高通量实验数据集：布赫瓦尔德-哈特维希交叉偶联（BHC）和铃木-宫浦交叉偶联（SMC）反应数据集，将所提出模型的预测能力与最先进的产率预测模型进行了比较。总体而言，这些模型对BHC反应数据集和一些面向外推的SMC反应数据集显示出较高的预测准确性。当训练数据集规模相对较大时，这些模型也表现良好。此外，对BHC反应数据集预测不佳的反应进行分析，揭示了基于化学结构相似性的数据驱动产率预测方法的局限性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f819/11447720/8beaeea6d0ae/ao4c06113_0001.jpg

相似文献

Chemical Graph-Based Transformer Models for Yield Prediction of High-Throughput Cross-Coupling Reaction Datasets.

ACS Omega. 2024 Sep 17;9(39):40907-40919. doi: 10.1021/acsomega.4c06113. eCollection 2024 Oct 1.

Prediction of Reaction Yield for Buchwald-Hartwig Cross-coupling Reactions Using Deep Learning.

Mol Inform. 2022 Feb;41(2):e2100156. doi: 10.1002/minf.202100156. Epub 2021 Sep 29.

On the use of real-world datasets for reaction yield prediction.

Chem Sci. 2023 Mar 13;14(19):4997-5005. doi: 10.1039/d2sc06041h. eCollection 2023 May 17.

ABT-MPNN: an atom-bond transformer-based message-passing neural network for molecular property prediction.

J Cheminform. 2023 Feb 26;15(1):29. doi: 10.1186/s13321-023-00698-9.

Transfer Learning: Making Retrosynthetic Predictions Based on a Small Chemical Reaction Dataset Scale to a New Level.

Molecules. 2020 May 19;25(10):2357. doi: 10.3390/molecules25102357.

Design of Experimental Conditions with Machine Learning for Collaborative Organic Synthesis Reactions Using Transition-Metal Catalysts.

ACS Omega. 2021 Oct 5;6(41):27578-27586. doi: 10.1021/acsomega.1c04826. eCollection 2021 Oct 19.

Advancing molecular graphs with descriptors for the prediction of chemical reaction yields.

J Comput Chem. 2023 Jan 15;44(2):76-92. doi: 10.1002/jcc.27016. Epub 2022 Oct 20.

GraphormerDTI: A graph transformer-based approach for drug-target interaction prediction.

Comput Biol Med. 2024 May;173:108339. doi: 10.1016/j.compbiomed.2024.108339. Epub 2024 Mar 18.

MPTN: A message-passing transformer network for drug repurposing from knowledge graph.

Comput Biol Med. 2024 Jan;168:107800. doi: 10.1016/j.compbiomed.2023.107800. Epub 2023 Dec 1.

Integrating concept of pharmacophore with graph neural networks for chemical property prediction and interpretation.

J Cheminform. 2022 Aug 4;14(1):52. doi: 10.1186/s13321-022-00634-3.

本文引用的文献

Pharmacophoric-constrained heterogeneous graph transformer model for molecular property prediction.

Commun Chem. 2023 Apr 3;6(1):60. doi: 10.1038/s42004-023-00857-x.

Double-head transformer neural network for molecular property prediction.

J Cheminform. 2023 Feb 23;15(1):27. doi: 10.1186/s13321-023-00700-4.

Extended Connectivity Fingerprints as a Chemical Reaction Representation for Enantioselective Organophosphorus-Catalyzed Asymmetric Reaction Prediction.

ACS Omega. 2022 Jul 25;7(30):26952-26964. doi: 10.1021/acsomega.2c03812. eCollection 2022 Aug 2.

Unified Deep Learning Model for Multitask Reaction Predictions with Explanation.

J Chem Inf Model. 2022 Mar 28;62(6):1376-1387. doi: 10.1021/acs.jcim.1c01467. Epub 2022 Mar 10.

Uncertainty-aware prediction of chemical reaction yields with graph neural networks.

J Cheminform. 2022 Jan 10;14(1):2. doi: 10.1186/s13321-021-00579-z.

Prediction of Reaction Yield for Buchwald-Hartwig Cross-coupling Reactions Using Deep Learning.

Mol Inform. 2022 Feb;41(2):e2100156. doi: 10.1002/minf.202100156. Epub 2021 Sep 29.

Extraction of organic chemistry grammar from unsupervised learning of chemical reactions.

Sci Adv. 2021 Apr 7;7(15). doi: 10.1126/sciadv.abe4166. Print 2021 Apr.

Bayesian reaction optimization as a tool for chemical synthesis.

Nature. 2021 Feb;590(7844):89-96. doi: 10.1038/s41586-021-03213-y. Epub 2021 Feb 3.

PubChem in 2021: new data content and improved web interfaces.

Nucleic Acids Res. 2021 Jan 8;49(D1):D1388-D1395. doi: 10.1093/nar/gkaa971.

Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction.

ACS Cent Sci. 2019 Sep 25;5(9):1572-1583. doi: 10.1021/acscentsci.9b00576. Epub 2019 Aug 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于化学图的变压器模型用于高通量交叉偶联反应数据集的产率预测

Chemical Graph-Based Transformer Models for Yield Prediction of High-Throughput Cross-Coupling Reaction Datasets.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献