Schoepfer Alexandre A, Laplaza Ruben, Wodrich Matthew D, Waser Jerome, Corminboeuf Clemence
Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland.
Laboratory of Catalysis and Organic Synthesis, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland.
ACS Catal. 2024 Jun 4;14(12):9302-9312. doi: 10.1021/acscatal.4c02452. eCollection 2024 Jun 21.
Chiral ligands are important components in asymmetric homogeneous catalysis, but their synthesis and screening can be both time-consuming and resource-intensive. Data-driven approaches, in contrast to screening procedures based on intuition, have the potential to reduce the time and resources needed for reaction optimization by more rapidly identifying an ideal catalyst. These approaches, however, are often nontransferable and cannot be applied across different reactions. To overcome this drawback, we introduce a general featurization strategy for bidentate ligands that is coupled with an automated feature selection pipeline and Bayesian ridge regression to perform multivariate linear regression modeling. This approach, which is applicable to any reaction, incorporates electronic, steric, and topological features (rigidity/flexibility, branching, geometry, and constitution) and is well-suited for early stage ligand optimization. Using only small data sets, our workflow capably predicts the enantioselectivity of four metal-catalyzed asymmetric reactions. Uncertainty estimates provided by Bayesian ridge regression permit the use of Bayesian optimization to efficiently explore pools of prospective ligands. Finally, we constructed the BDL-Cu-2023 data set, composed of 312 bidentate ligands extracted from the Cambridge Structural Database, and screened it with this procedure to identify ligand candidates for a challenging asymmetric oxy-alkynylation reaction.
手性配体是不对称均相催化中的重要组成部分,但其合成和筛选既耗时又耗费资源。与基于直觉的筛选程序相比,数据驱动的方法有可能通过更快地识别理想催化剂来减少反应优化所需的时间和资源。然而,这些方法通常不可转移,不能应用于不同的反应。为了克服这一缺点,我们引入了一种针对双齿配体的通用特征化策略,该策略与自动特征选择管道和贝叶斯岭回归相结合,以进行多元线性回归建模。这种适用于任何反应的方法纳入了电子、空间和拓扑特征(刚性/柔性、分支、几何形状和结构),非常适合早期配体优化。仅使用小数据集,我们的工作流程就能预测四种金属催化的不对称反应的对映选择性。贝叶斯岭回归提供的不确定性估计允许使用贝叶斯优化来有效地探索潜在配体库。最后,我们构建了由从剑桥结构数据库中提取的312个双齿配体组成的BDL-Cu-2023数据集,并用此程序对其进行筛选,以确定具有挑战性的不对称氧炔基化反应的配体候选物。