Stephens Shannon M, Lambert Kyle M
Department of Chemistry and Biochemistry, Old Dominion University, 4501 Elkhorn Ave, Norfolk, Virginia 23529, United States.
J Org Chem. 2025 May 2;90(17):6000-6012. doi: 10.1021/acs.joc.5c00343. Epub 2025 Apr 23.
A supervised machine learning model has been developed that allows for the prediction of site selectivity in late-stage C-H borylations. Model development was accomplished using literature data for the site-selective (≥95%) C-H borylation of 189 unique arene, heteroarene, and aliphatic substrates that feature a total of 971 possible sp or sp C-H borylation sites. The reported experimental data was supplemented with additional chemoinformatic descriptors, computed atomic charges at the C-H borylation sites, and data from parameterization of catalytically active tris-boryl complexes resulting from the combination of seven different Ir-, Ru-, and Rh-based precatalysts with eight different ligands. Of the over 1600 parameters investigated, the computed atomic charges (e.g., Hirshfeld, ChelpG, and Mulliken charges) on the hydrogen and carbon atoms at the site of borylation were identified as the most important features that allow for the successful prediction of whether a particular C-H bond will undergo a site-selective borylation. The overall accuracy of the developed model was 88.9% ± 2.5% with precision, recall, and F1 scores of 92-95% for the nonborylating sites and 65-75% for the sites of borylation. The model was demonstrated to be generalizable to molecules outside of the training/test sets with an additional validation set of 12 electronically and structurally diverse systems.
已开发出一种有监督的机器学习模型,可用于预测后期C-H硼化反应中的位点选择性。模型开发是利用文献数据完成的,这些数据涉及189种独特的芳烃、杂芳烃和脂肪族底物的位点选择性(≥95%)C-H硼化反应,这些底物共有971个可能的sp或sp C-H硼化位点。报告的实验数据补充了额外的化学信息描述符、C-H硼化位点处计算的原子电荷,以及由七种不同的基于铱、钌和铑的预催化剂与八种不同配体组合而成的催化活性三硼络合物的参数化数据。在所研究的1600多个参数中,硼化位点处氢原子和碳原子上计算的原子电荷(例如,Hirshfeld电荷、ChelpG电荷和Mulliken电荷)被确定为最重要的特征,这些特征能够成功预测特定的C-H键是否会发生位点选择性硼化反应。所开发模型的总体准确率为88.9%±2.5%,非硼化位点的精确率、召回率和F1分数为92 - 95%,硼化位点的精确率、召回率和F1分数为65 - 75%。该模型已被证明可推广到训练/测试集之外的分子,还有一个由12个电子和结构多样的系统组成的验证集。