Borup Rasmus M, Ree Nicolai, Jensen Jan H
Department of Chemistry, University of Copenhagen, Copenhagen, DK-2100, Denmark.
Beilstein J Org Chem. 2024 Jul 16;20:1614-1622. doi: 10.3762/bjoc.20.144. eCollection 2024.
Determining the p values of various C-H sites in organic molecules offers valuable insights for synthetic chemists in predicting reaction sites. As molecular complexity increases, this task becomes more challenging. This paper introduces pKalculator, a quantum chemistry (QM)-based workflow for automatic computations of C-H p values, which is used to generate a training dataset for a machine learning (ML) model. The QM workflow is benchmarked against 695 experimentally determined C-H p values in DMSO. The ML model is trained on a diverse dataset of 775 molecules with 3910 C-H sites. Our ML model predicts C-H p values with a mean absolute error (MAE) and a root mean squared error (RMSE) of 1.24 and 2.15 p units, respectively. Furthermore, we employ our model on 1043 p -dependent reactions (aldol, Claisen, and Michael) and successfully indicate the reaction sites with a Matthew's correlation coefficient (MCC) of 0.82.
确定有机分子中各种C-H位点的p值,为合成化学家预测反应位点提供了有价值的见解。随着分子复杂性的增加,这项任务变得更具挑战性。本文介绍了pKalculator,这是一种基于量子化学(QM)的工作流程,用于自动计算C-H p值,该工作流程用于生成机器学习(ML)模型的训练数据集。该QM工作流程以二甲基亚砜中695个实验测定的C-H p值为基准进行测试。ML模型在包含775个分子、3910个C-H位点的多样化数据集上进行训练。我们的ML模型预测C-H p值的平均绝对误差(MAE)和均方根误差(RMSE)分别为1.24和2.15个p单位。此外,我们将模型应用于1043个与p相关的反应(羟醛缩合、克莱森缩合和迈克尔加成反应),并成功地以0.82的马修斯相关系数(MCC)指出了反应位点。