Suppr超能文献

机器学习从 NiCOlit 中预测产率,NiCOlit 是一个镍催化 C-O 偶联的小规模文献数据集。

Machine Learning Yield Prediction from NiCOlit, a Small-Size Literature Data Set of Nickel Catalyzed C-O Couplings.

机构信息

LBM, Département de Chimie, École Normale Supérieure, PSL University, Sorbonne Université, CNRS, 75005 Paris, France.

PASTEUR, Département de Chimie, École Normale Supérieure, PSL University, Sorbonne Université, CNRS, 75005 Paris, France.

出版信息

J Am Chem Soc. 2022 Aug 17;144(32):14722-14730. doi: 10.1021/jacs.2c05302. Epub 2022 Aug 8.

Abstract

Synthetic yield prediction using machine learning is intensively studied. Previous work has focused on two categories of data sets: high-throughput experimentation data, as an ideal case study, and data sets extracted from proprietary databases, which are known to have a strong reporting bias toward high yields. However, predicting yields using published reaction data remains elusive. To fill the gap, we built a data set on nickel-catalyzed cross-couplings extracted from organic reaction publications, including scope and optimization information. We demonstrate the importance of including optimization data as a source of failed experiments and emphasize how publication constraints shape the exploration of the chemical space by the synthetic community. While machine learning models still fail to perform out-of-sample predictions, this work shows that adding chemical knowledge enables fair predictions in a low-data regime. Eventually, we hope that this unique public database will foster further improvements of machine learning methods for reaction yield prediction in a more realistic context.

摘要

使用机器学习进行合成产率预测是一个受到广泛研究的课题。先前的工作主要集中在两类数据集上:高通量实验数据,作为一个理想的案例研究,以及从专有数据库中提取的数据,这些数据已知存在强烈的高产率报告偏差。然而,使用已发表的反应数据来预测产率仍然难以实现。为了填补这一空白,我们构建了一个从有机反应文献中提取的镍催化交叉偶联反应数据集,其中包括反应范围和优化信息。我们证明了将优化数据作为失败实验的来源的重要性,并强调了出版限制如何影响合成社区对化学空间的探索。虽然机器学习模型仍然无法进行样本外预测,但这项工作表明,添加化学知识可以在数据量较少的情况下实现公平预测。最终,我们希望这个独特的公共数据库能够在更现实的背景下促进机器学习方法在反应产率预测方面的进一步改进。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验