Rani Neha, Kumar Rohit, Mazumder Shivnath
Department of Chemistry, Indian Institute of Technology Jammu, Jammu 181221, India.
Novartis, HITEC City, Hyderabad, Telangana 500081, India.
J Phys Chem A. 2024 Dec 5;128(48):10452-10463. doi: 10.1021/acs.jpca.4c06701. Epub 2024 Nov 21.
Enantioselective catalytic reactions have a significant impact on chemical synthesis, and they are important components in an experimental chemist's toolbox. However, development of asymmetric catalysts often relies on the chemical intuition and experience of a synthetic chemist, making the process both time-consuming and resource-intensive. The machine-learning-assisted reaction discovery can serve as a very efficient platform for obtaining high-performing catalysts in a time-economical manner without extensive experimentation. Herein, we report a data-driven and machine learning method for reliably predicting enantiomeric excess (%ee) of 211 asymmetric Pauson-Khand reactions (PKR 1-PKR 211) between a variety of 45 unique 1,6-enyne substrates and 12 unique axially chiral biaryl ligands in the presence of different reaction conditions like varying CO gas pressure, temperature, and solvent polarity. Four different machine learning algorithms have been studied: extreme gradient boosting (XGBoost), random forest (RF), light gradient boosting machine (LGBM), and neural network (NN). A fivefold cross validation method was applied to our k-means SMOTE-augmented data set to obtain the optimized hyperparameters for the training set, and subsequently, these parameters were used in the test data set. In the case of the out-of-box set, the XGBoost method is found to be superior among all four machine learning methods investigated. Our out-of-box samples contain a total of 12 unique asymmetric Pauson-Khand reactions (PKR 212-PKR 223) arising from three new 1,3-benzodioxole-based SEGPHOS catalysts, which were never included in the training set. The XGBoost algorithm shows an impressive root mean square error (RMSE) of 7.06 (±1.11) in predicting %ee. The XGBoost-predicted %ee values match reasonably well with the experimental results. The absolute difference between the experimental and XGBoost-calculated %ee values ranges from 0.9 to 7.6 for the majority of the out-of-box Pauson-Khand reactions. The reactions with fluoro-substituted-SEGPHOS ligand shows smaller deviations from the experimental %ee values compared to the reactions with and catalysts where the benzodioxole units do not have fluorine atoms. Finally, we have discovered a library of 3357 lead reactions with excellent %ee (≥99) by engaging the experimentally unknown combinations of the catalysts, substrates, and reaction conditions. The axially chiral biaryl catalysts and enyne substrates present in the library are synthetically accessible. The ligand space in the library is dominated by the presence of tol-BINAP and the DTBM-OMe-BIPHEP ligands. The substrate space is predominantly occupied by NTs-tethered, O-tethered, NBn-tethered, and C(COMe)-tethered 1,6-enynes that have an H or methyl functional group present in the alkyne unit. Our newly discovered library assists a synthetic chemist to develop a highly enantioselective PKR by starting with knowledge without extensive trial-and-error experimentation.
对映选择性催化反应对化学合成有重大影响,是实验化学家工具库中的重要组成部分。然而,不对称催化剂的开发往往依赖于合成化学家的化学直觉和经验,这使得该过程既耗时又耗费资源。机器学习辅助的反应发现可以作为一个非常有效的平台,以节省时间的方式获得高性能催化剂,而无需进行大量实验。在此,我们报告一种数据驱动的机器学习方法,用于可靠预测在不同反应条件(如不同的一氧化碳气体压力、温度和溶剂极性)下,45种独特的1,6-烯炔底物与12种独特的轴向手性联芳基配体之间的211个不对称Pauson-Khand反应(PKR 1-PKR 211)的对映体过量(%ee)。研究了四种不同的机器学习算法:极端梯度提升(XGBoost)、随机森林(RF)、轻梯度提升机(LGBM)和神经网络(NN)。对我们的k均值SMOTE增强数据集应用五折交叉验证方法,以获得训练集的优化超参数,随后将这些参数用于测试数据集。在开箱即用集的情况下,发现XGBoost方法在所有四种研究的机器学习方法中表现最优。我们的开箱即用样本包含总共12个独特的不对称Pauson-Khand反应(PKR 212-PKR 223),这些反应源自三种新的基于1,3-苯并二恶唑的SEGPHOS催化剂,它们从未包含在训练集中。XGBoost算法在预测%ee时显示出令人印象深刻的均方根误差(RMSE)为7.06(±1.11)。XGBoost预测的%ee值与实验结果相当吻合。对于大多数开箱即用的Pauson-Khand反应,实验值与XGBoost计算的%ee值之间的绝对差值在0.9至7.6之间。与使用苯并二恶唑单元不含氟原子的催化剂的反应相比,使用氟取代的SEGPHOS配体的反应与实验%ee值的偏差更小。最后,通过采用催化剂、底物和反应条件的实验未知组合,我们发现了一个包含3357个具有优异%ee(≥99)的先导反应库。该库中存在的轴向手性联芳基催化剂和烯炔底物在合成上是可获得的。该库中的配体空间以tol-BINAP和DTBM-OMe-BIPHEP配体为主。底物空间主要由具有炔单元中存在H或甲基官能团的NTs连接、O连接、NBn连接和C(COMe)连接的1,6-烯炔占据。我们新发现的库有助于合成化学家在无需大量反复试验的情况下,从已知知识出发开发高度对映选择性的PKR。