Sinha Vivek, Laan Jochem J, Pidko Evgeny A
Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology, 2629 HZ, Delft, The Netherlands.
Phys Chem Chem Phys. 2021 Feb 4;23(4):2557-2567. doi: 10.1039/d0cp05281g.
Rapid and accurate prediction of reactivity descriptors of transition metal (TM) complexes is a major challenge for contemporary quantum chemistry. The recently-developed GFN2-xTB method based on the density functional tight-binding theory (DFT-B) is suitable for high-throughput calculation of geometries and thermochemistry for TM complexes albeit with moderate accuracy. Herein we present a data-augmented approach to improve substantially the accuracy of the GFN2-xTB method for the prediction of thermochemical properties using pKa values of TM hydrides as a representative model example. We constructed a comprehensive database for ca. 200 TM hydride complexes featuring the experimentally measured pKa values as well as the GFN2-xTB-optimized geometries and various computed electronic and energetic descriptors. The GFN2-xTB results were further refined and validated by DFT calculations with the hybrid PBE0 functional. Our results show that although the GFN2-xTB performs well in most cases, it fails to adequately describe TM complexes featuring multicarbonyl and multihydride ligand environments. The dataset was analyzed with the ordinary least squares (OLS) fitting and was used to construct an automated machine learning (AutoML) approach for the rapid estimation of pKa of TM hydride complexes. The results obtained show a high predictive power of the very fast AutoML model (RMSE ∼ 2.7) comparable to that of the much slower DFT calculations (RMSE ∼ 3). The presented data-augmented quantum chemistry-based approach is promising for high-throughput computational screening workflows of homogeneous TM-based catalysts.
快速准确地预测过渡金属(TM)配合物的反应性描述符是当代量子化学面临的一项重大挑战。最近基于密度泛函紧束缚理论(DFT - B)开发的GFN2 - xTB方法适用于TM配合物几何结构和热化学的高通量计算,尽管精度一般。在此,我们提出一种数据增强方法,以使用TM氢化物的pKa值作为代表性模型示例,大幅提高GFN2 - xTB方法预测热化学性质的准确性。我们构建了一个包含约200种TM氢化物配合物的综合数据库,其中包括实验测量的pKa值以及GFN2 - xTB优化的几何结构和各种计算得到的电子和能量描述符。GFN2 - xTB结果通过采用杂化PBE0泛函的DFT计算进一步细化和验证。我们的结果表明,尽管GFN2 - xTB在大多数情况下表现良好,但它无法充分描述具有多羰基和多氢化物配体环境的TM配合物。使用普通最小二乘法(OLS)拟合对数据集进行分析,并用于构建一种自动机器学习(AutoML)方法,用于快速估计TM氢化物配合物的pKa。所得结果表明,非常快速的AutoML模型(RMSE ∼ 2.7)具有很高的预测能力,与慢得多的DFT计算(RMSE ∼ 3)相当。所提出的基于数据增强量子化学的方法对于基于TM的均相催化剂的高通量计算筛选工作流程具有前景。