Department of Chemistry, Emory University, Atlanta, 30322, Georgia.
Phys Chem Chem Phys. 2023 May 17;25(19):13417-13428. doi: 10.1039/d3cp00506b.
Due to the limitation of solvent models, quantum chemistry calculation of solution-phase molecular properties often deviates from experimental measurements. Recently, Δ-machine learning (Δ-ML) was shown to be a promising approach to correcting errors in the quantum chemistry calculation of solvated molecules. However, this approach's applicability to different molecular properties and its performance in various cases are still unknown. In this work, we tested the performance of Δ-ML in correcting redox potential and absorption energy calculations using four types of input descriptors and various ML methods. We sought to understand the dependence of Δ-ML performance on the property to predict the quantum chemistry method, the data set distribution/size, the type of input feature, and the feature selection techniques. We found that Δ-ML can effectively correct the errors in redox potentials calculated using density functional theory (DFT) and absorption energies calculated by time-dependent DFT. For both properties, the Δ-ML-corrected results showed less sensitivity to the DFT functional choice than the raw results. The optimal input descriptor depends on the property, regardless of the specific ML method used. The solvent-solute descriptor (SS) is the best for redox potential, whereas the combined molecular fingerprint (cFP) is the best for absorption energy. A detailed analysis of the feature space and the physical foundation of different descriptors well explained these observations. Feature selection did not further improve the Δ-ML performance. Finally, we analyzed the limitation of our Δ-ML solvent effect approach in data sets with molecules of varying degrees of electronic structure errors.
由于溶剂模型的限制,溶液相分子性质的量子化学计算往往与实验测量值存在偏差。最近,Δ-机器学习(Δ-ML)被证明是一种很有前途的方法,可以修正溶剂化分子量子化学计算中的误差。然而,这种方法在不同分子性质上的适用性以及在各种情况下的表现仍然未知。在这项工作中,我们使用四种类型的输入描述符和各种 ML 方法,测试了 Δ-ML 在修正氧化还原电位和吸收能计算中的性能。我们试图了解 Δ-ML 性能对预测量子化学方法、数据集分布/大小、输入特征类型和特征选择技术的依赖关系。我们发现,Δ-ML 可以有效地修正密度泛函理论(DFT)计算的氧化还原电位和含时密度泛函理论(TD-DFT)计算的吸收能的误差。对于这两种性质,Δ-ML 修正后的结果比原始结果对 DFT 函数选择的敏感性要小。最优的输入描述符取决于性质,而与使用的特定 ML 方法无关。对于氧化还原电位,溶剂-溶质描述符(SS)是最佳的,而对于吸收能,组合分子指纹(cFP)是最佳的。对特征空间和不同描述符物理基础的详细分析很好地解释了这些观察结果。特征选择并没有进一步提高 Δ-ML 的性能。最后,我们分析了我们的 Δ-ML 溶剂效应方法在电子结构误差程度不同的分子数据集上的局限性。