Suppr超能文献

机器学习中的C-N偶联:通用反应产率预测的障碍

Machine Learning C-N Couplings: Obstacles for a General-Purpose Reaction Yield Prediction.

作者信息

Fitzner Martin, Wuitschik Georg, Koller Raffael, Adam Jean-Michel, Schindler Torsten

机构信息

Roche Pharma Research and Early Development, pRED Informatics, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, CH-4070 Basel, Switzerland.

Merck KGaA, Science and Technology Office, Digital Chemistry, Frankfurter Str. 250, 64293 Darmstadt, Germany.

出版信息

ACS Omega. 2023 Jan 11;8(3):3017-3025. doi: 10.1021/acsomega.2c05546. eCollection 2023 Jan 24.

Abstract

Pd-catalyzed C-N couplings are commonplace in academia and industry. Despite their significance, finding suitable reaction conditions leading to a high yield, for instance, remains a challenging and time-consuming task which usually requires screening over many sets of conditions. To help select promising reaction conditions in the vast space of reagent combinations, machine learning is an emerging technique with a lot of promise. In this work, we assess whether the reaction yield of C-N couplings can be predicted from databases of chemical reactions. We test the generalizability of models both on challenging data splits and on a dedicated experimental test set. We find that, provided the chemical space represented by the training set is not left, the models perform well. However, the applicability domain is quickly left even for simple reactions of the same type, as, for instance, present in our plate test set. The results show that yield prediction for new reactions is possible from the algorithmic side but in practice is hindered by the available data. Most importantly, more data that cover the diversity in reagents are needed for a general-purpose prediction of reaction yields. Our findings also expose a challenge to this field in that it appears to be extremely deceiving to judge models based on literature data with test sets which are split off the same literature data, even when challenging splits are considered.

摘要

钯催化的碳氮偶联反应在学术界和工业界都很常见。尽管它们很重要,但要找到能实现高产率的合适反应条件,例如,仍然是一项具有挑战性且耗时的任务,通常需要对多组条件进行筛选。为了在大量的试剂组合空间中帮助选择有前景的反应条件,机器学习是一种很有前景的新兴技术。在这项工作中,我们评估了能否从化学反应数据库中预测碳氮偶联反应的产率。我们在具有挑战性的数据划分以及专门的实验测试集上测试了模型的泛化能力。我们发现,只要不脱离训练集所代表的化学空间,模型就能表现良好。然而,即使对于同一类型的简单反应,比如我们平板测试集中的反应,适用范围也很快就会超出。结果表明,从算法角度来看,预测新反应的产率是可能的,但在实际中受到可用数据的阻碍。最重要的是,为了对反应产率进行通用预测,需要更多涵盖试剂多样性的数据。我们的研究结果还揭示了该领域面临的一个挑战,即基于与测试集来自同一文献数据的文献数据来评判模型似乎极具误导性,即使考虑了具有挑战性的划分。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b346/9878668/0b931a8ef721/ao2c05546_0002.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验