机器学习从 NiCOlit 中预测产率，NiCOlit 是一个镍催化 C-O 偶联的小规模文献数据集。

Machine Learning Yield Prediction from NiCOlit, a Small-Size Literature Data Set of Nickel Catalyzed C-O Couplings.

机构信息

LBM, Département de Chimie, École Normale Supérieure, PSL University, Sorbonne Université, CNRS, 75005 Paris, France.

PASTEUR, Département de Chimie, École Normale Supérieure, PSL University, Sorbonne Université, CNRS, 75005 Paris, France.

出版信息

J Am Chem Soc. 2022 Aug 17;144(32):14722-14730. doi: 10.1021/jacs.2c05302. Epub 2022 Aug 8.

DOI:10.1021/jacs.2c05302

PMID:35939717

Abstract

Synthetic yield prediction using machine learning is intensively studied. Previous work has focused on two categories of data sets: high-throughput experimentation data, as an ideal case study, and data sets extracted from proprietary databases, which are known to have a strong reporting bias toward high yields. However, predicting yields using published reaction data remains elusive. To fill the gap, we built a data set on nickel-catalyzed cross-couplings extracted from organic reaction publications, including scope and optimization information. We demonstrate the importance of including optimization data as a source of failed experiments and emphasize how publication constraints shape the exploration of the chemical space by the synthetic community. While machine learning models still fail to perform out-of-sample predictions, this work shows that adding chemical knowledge enables fair predictions in a low-data regime. Eventually, we hope that this unique public database will foster further improvements of machine learning methods for reaction yield prediction in a more realistic context.

摘要

使用机器学习进行合成产率预测是一个受到广泛研究的课题。先前的工作主要集中在两类数据集上：高通量实验数据，作为一个理想的案例研究，以及从专有数据库中提取的数据，这些数据已知存在强烈的高产率报告偏差。然而，使用已发表的反应数据来预测产率仍然难以实现。为了填补这一空白，我们构建了一个从有机反应文献中提取的镍催化交叉偶联反应数据集，其中包括反应范围和优化信息。我们证明了将优化数据作为失败实验的来源的重要性，并强调了出版限制如何影响合成社区对化学空间的探索。虽然机器学习模型仍然无法进行样本外预测，但这项工作表明，添加化学知识可以在数据量较少的情况下实现公平预测。最终，我们希望这个独特的公共数据库能够在更现实的背景下促进机器学习方法在反应产率预测方面的进一步改进。

相似文献

Machine Learning Yield Prediction from NiCOlit, a Small-Size Literature Data Set of Nickel Catalyzed C-O Couplings.机器学习从 NiCOlit 中预测产率，NiCOlit 是一个镍催化 C-O 偶联的小规模文献数据集。

J Am Chem Soc. 2022 Aug 17;144(32):14722-14730. doi: 10.1021/jacs.2c05302. Epub 2022 Aug 8.

Predicting Reaction Yields via Supervised Learning.通过有监督学习预测反应产率。

Acc Chem Res. 2021 Apr 20;54(8):1856-1865. doi: 10.1021/acs.accounts.0c00770. Epub 2021 Mar 31.

Machine Learning C-N Couplings: Obstacles for a General-Purpose Reaction Yield Prediction.机器学习中的C-N偶联：通用反应产率预测的障碍

ACS Omega. 2023 Jan 11;8(3):3017-3025. doi: 10.1021/acsomega.2c05546. eCollection 2023 Jan 24.

Cross-Couplings Using Aryl Ethers via C-O Bond Activation Enabled by Nickel Catalysts.镍催化剂促进的通过 C-O 键活化的芳基醚的交叉偶联反应。

Acc Chem Res. 2015 Jun 16;48(6):1717-26. doi: 10.1021/acs.accounts.5b00051. Epub 2015 Jun 3.

Machine Learning for Chemical Reactivity: The Importance of Failed Experiments.机器学习在化学反应中的应用：失败实验的重要性。

Angew Chem Int Ed Engl. 2022 Jul 18;61(29):e202204647. doi: 10.1002/anie.202204647. Epub 2022 Jun 7.

Using Active Learning to Develop Machine Learning Models for Reaction Yield Prediction.利用主动学习开发用于反应产率预测的机器学习模型。

Mol Inform. 2022 Dec;41(12):e2200043. doi: 10.1002/minf.202200043. Epub 2022 Jul 14.

A machine-learning tool to predict substrate-adaptive conditions for Pd-catalyzed C-N couplings.一种用于预测钯催化碳氮偶联反应底物适应性条件的机器学习工具。

Science. 2023 Sep;381(6661):965-972. doi: 10.1126/science.adg2114. Epub 2023 Aug 31.

Predicting reaction performance in C-N cross-coupling using machine learning.使用机器学习预测 C-N 交叉偶联反应性能。

Science. 2018 Apr 13;360(6385):186-190. doi: 10.1126/science.aar5169. Epub 2018 Feb 15.

Decarbonylative Cross-Couplings: Nickel Catalyzed Functional Group Interconversion Strategies for the Construction of Complex Organic Molecules.脱羰交叉偶联反应：镍催化构建复杂有机分子的官能团相互转化策略

Acc Chem Res. 2018 May 15;51(5):1185-1195. doi: 10.1021/acs.accounts.8b00023. Epub 2018 Apr 13.

Incorporating Synthetic Accessibility in Drug Design: Predicting Reaction Yields of Suzuki Cross-Couplings by Leveraging AbbVie's 15-Year Parallel Library Data Set.在药物设计中纳入合成可及性：利用 AbbVie 长达 15 年的平行文库数据集预测铃木交叉偶联反应产率。

J Am Chem Soc. 2024 Jun 5;146(22):15070-15084. doi: 10.1021/jacs.4c00098. Epub 2024 May 20.

引用本文的文献

Predicting reaction conditions: a data-driven perspective.预测反应条件：数据驱动的视角

Chem Sci. 2025 Aug 6. doi: 10.1039/d5sc03045e.

Organometallic-type reactivity of stable organoboronates for selective (hetero)arene C-H/C-halogen borylation and beyond.稳定有机硼酸酯用于选择性（杂）芳烃C-H/C-卤硼化及其他反应的有机金属型反应活性

Nat Commun. 2025 Jul 1;16(1):5458. doi: 10.1038/s41467-025-60674-9.

Molecular Machine Learning Approach to Enantioselective C-H Bond Activation Reactions: From Generative AI to Experimental Validation.用于对映选择性C-H键活化反应的分子机器学习方法：从生成式人工智能到实验验证

Chem Sci. 2025 Jun 10. doi: 10.1039/d5sc01098e.

Study on Phosphorus Compound/Catechol-Catalyzed Dehydrative Amidation and Its Database Development for Machine Learning.磷化合物/邻苯二酚催化的脱水酰胺化反应研究及其机器学习数据库开发

Chemistry. 2025 Aug 1;31(43):e202500955. doi: 10.1002/chem.202500955. Epub 2025 Jun 11.

Towards global reaction feasibility and robustness prediction with high throughput data and bayesian deep learning.利用高通量数据和贝叶斯深度学习实现全球反应可行性和稳健性预测

Nat Commun. 2025 May 15;16(1):4522. doi: 10.1038/s41467-025-59812-0.

Local reaction condition optimization via machine learning.通过机器学习优化局部反应条件

J Mol Model. 2025 Apr 23;31(5):143. doi: 10.1007/s00894-025-06365-0.

A meta-learning approach for selectivity prediction in asymmetric catalysis.一种用于不对称催化中选择性预测的元学习方法。

Nat Commun. 2025 Apr 16;16(1):3599. doi: 10.1038/s41467-025-58854-8.

Discovering organic reactions with a machine-learning-powered deciphering of tera-scale mass spectrometry data.通过机器学习驱动的太赫兹级质谱数据解析发现有机反应。

Nat Commun. 2025 Mar 16;16(1):2587. doi: 10.1038/s41467-025-56905-8.

Designing Target-specific Data Sets for Regioselectivity Predictions on Complex Substrates.设计用于复杂底物区域选择性预测的靶向特定数据集。

J Am Chem Soc. 2025 Mar 5;147(9):7476-7484. doi: 10.1021/jacs.4c15902. Epub 2025 Feb 21.

Predicting and Explaining Yields with Machine Learning for Carboxylated Azoles and Beyond.利用机器学习预测和解释羧基化唑类及其他物质的产率

J Chem Inf Model. 2025 Feb 24;65(4):1862-1872. doi: 10.1021/acs.jcim.4c02336. Epub 2025 Feb 7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

机器学习从 NiCOlit 中预测产率，NiCOlit 是一个镍催化 C-O 偶联的小规模文献数据集。

Machine Learning Yield Prediction from NiCOlit, a Small-Size Literature Data Set of Nickel Catalyzed C-O Couplings.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献