Suppr超能文献

基于高通量实验数据探索机器学习模型以发现不对称氢化催化剂。

Probing machine learning models based on high throughput experimentation data for the discovery of asymmetric hydrogenation catalysts.

作者信息

Kalikadien Adarsh V, Valsecchi Cecile, van Putten Robbert, Maes Tor, Muuronen Mikko, Dyubankova Natalia, Lefort Laurent, Pidko Evgeny A

机构信息

Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology Van der Maasweg 9, 2629 HZ Delft The Netherlands

Discovery, Product Development and Supply, Janssen Cilag S.p.A. Viale Fulvio Testi, 280/6 20126 Milano Italy.

出版信息

Chem Sci. 2024 Jul 16;15(34):13618-13630. doi: 10.1039/d4sc03647f. eCollection 2024 Aug 28.

Abstract

Enantioselective hydrogenation of olefins by Rh-based chiral catalysts has been extensively studied for more than 50 years. Naively, one would expect that everything about this transformation is known and that selecting a catalyst that induces the desired reactivity or selectivity is a trivial task. Nonetheless, ligand engineering or selection for any new prochiral olefin remains an empirical trial-error exercise. In this study, we investigated whether machine learning techniques could be used to accelerate the identification of the most efficient chiral ligand. For this purpose, we used high throughput experimentation to build a large dataset consisting of results for Rh-catalyzed asymmetric olefin hydrogenation, specially designed for applications in machine learning. We showcased its alignment with existing literature while addressing observed discrepancies. Additionally, a computational framework for the automated and reproducible quantum-chemistry based featurization of catalyst structures was created. Together with less computationally demanding representations, these descriptors were fed into our machine learning pipeline for both out-of-domain and in-domain prediction tasks of selectivity and reactivity. For out-of-domain purposes, our models provided limited efficacy. It was found that even the most expensive descriptors do not impart significant meaning to the model predictions. The in-domain application, while partly successful for predictions of conversion, emphasizes the need for evaluating the cost-benefit ratio of computationally intensive descriptors and for tailored descriptor design. Challenges persist in predicting enantioselectivity, calling for caution in interpreting results from small datasets. Our insights underscore the importance of dataset diversity with broad substrate inclusion and suggest that mechanistic considerations could improve the accuracy of statistical models.

摘要

基于铑的手性催化剂对烯烃的对映选择性氢化已经被广泛研究了50多年。天真地说,人们会认为关于这种转化的一切都已为人所知,并且选择一种能诱导所需反应性或选择性的催化剂是一项轻而易举的任务。然而,对于任何新的前手性烯烃的配体工程或选择仍然是一个经验性的试错过程。在这项研究中,我们研究了机器学习技术是否可用于加速最有效手性配体的识别。为此,我们使用高通量实验构建了一个大型数据集,该数据集由铑催化的不对称烯烃氢化结果组成,专门为机器学习应用而设计。我们展示了它与现有文献的一致性,同时解决了观察到的差异。此外,还创建了一个基于量子化学的催化剂结构自动化和可重复特征化的计算框架。连同计算要求较低的表示一起,这些描述符被输入到我们的机器学习管道中,用于选择性和反应性的域外和域内预测任务。对于域外目的,我们的模型效果有限。发现即使是最昂贵的描述符也没有给模型预测带来显著意义。域内应用虽然在转化率预测方面部分成功,但强调了评估计算密集型描述符的成本效益比和定制描述符设计的必要性。预测对映选择性方面的挑战仍然存在,这要求在解释小数据集的结果时要谨慎。我们的见解强调了包含广泛底物的数据集多样性的重要性,并表明机理考虑可以提高统计模型的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d7d/11352728/5b653cd8cd6e/d4sc03647f-f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验