Das Sambit Kumar, Chakraborty Sabyasachi, Ramakrishnan Raghunathan
Tata Institute of Fundamental Research, Centre for Interdisciplinary Sciences, Hyderabad 500107, India.
J Chem Phys. 2021 Jan 28;154(4):044113. doi: 10.1063/5.0032713.
First-principles calculation of the standard formation enthalpy, ΔH° (298 K), in such a large scale as required by chemical space explorations, is amenable only with density functional approximations (DFAs) and certain composite wave function theories (cWFTs). Unfortunately, the accuracies of popular range-separated hybrid, "rung-4" DFAs, and cWFTs that offer the best accuracy-vs-cost trade-off have until now been established only for datasets predominantly comprising small molecules; their transferability to larger systems remains vague. In this study, we present an extended benchmark dataset of ΔH° for structurally and electronically diverse molecules. We apply quartile-ranking based on boundary-corrected kernel density estimation to filter outliers and arrive at probabilistically pruned enthalpies of 1694 compounds (PPE1694). For this dataset, we rank the prediction accuracies of G4, G4(MP2), ccCA, CBS-QB3, and 23 popular DFAs using conventional and probabilistic error metrics. We discuss systematic prediction errors and highlight the role an empirical higher-level correction plays in the G4(MP2) model. Furthermore, we comment on uncertainties associated with the reference empirical data for atoms and the systematic errors stemming from these that grow with the molecular size. We believe that these findings will aid in identifying meaningful application domains for quantum thermochemical methods.
在化学空间探索所需的如此大规模下,标准生成焓ΔH°(298 K)的第一性原理计算仅适用于密度泛函近似(DFA)和某些复合波函数理论(cWFT)。不幸的是,迄今为止,流行的范围分离杂化、“第4级”DFA以及提供最佳精度与成本权衡的cWFT的准确性仅针对主要由小分子组成的数据集得以确立;它们对更大体系的可转移性仍不明确。在本研究中,我们展示了一个针对结构和电子性质多样的分子的ΔH°扩展基准数据集。我们应用基于边界校正核密度估计的四分位数排序来过滤异常值,并得出1694种化合物的概率修剪焓(PPE1694)。对于该数据集,我们使用传统和概率误差度量对G4、G4(MP2)、ccCA、CBS - QB3以及23种流行的DFA的预测准确性进行排名。我们讨论了系统预测误差,并强调了经验性高级校正在G4(MP2)模型中所起的作用。此外,我们评论了与原子参考经验数据相关的不确定性以及由这些不确定性随分子大小增长而产生的系统误差。我们相信这些发现将有助于确定量子热化学方法有意义的应用领域。