Xia Song, Zhang Dongdong, Zhang Yingkai
Department of Chemistry, New York University, New York, New York10003, United States.
Simons Center for Computational Physical Chemistry at New York University, New York, New York10003, United States.
J Chem Theory Comput. 2023 Jan 6. doi: 10.1021/acs.jctc.2c01024.
The past few years have witnessed significant advances in developing machine learning methods for molecular energetics predictions, including calculated electronic energies with high-level quantum mechanical methods and experimental properties, such as solvation free energy and logP. Typically, task-specific machine learning models are developed for distinct prediction tasks. In this work, we present a multitask deep ensemble model, sPhysNet-MT-ens5, which can simultaneously and accurately predict electronic energies of molecules in gas, water, and octanol phases, as well as transfer free energies at both calculated and experimental levels. On the calculated data set Frag20-solv-678k, which is developed in this work and contains 678,916 molecular conformations, up to 20 heavy atoms, and their properties calculated at B3LYP/6-31G* level of theory with continuum solvent models, sPhysNet-MT-ens5 predicts density functional theory (DFT)-level electronic energies directly from force field-optimized geometry within chemical accuracy. On the experimental data sets, sPhysNet-MT-ens5 achieves state-of-the-art performances, which predict both experimental hydration free energy with a RMSE of 0.620 kcal/mol on the FreeSolv data set and experimental logP with a RMSE of 0.393 on the PHYSPROP data set. Furthermore, sPhysNet-MT-ens5 also provides a reasonable estimation of model uncertainty which shows correlations with prediction error. Finally, by analyzing the atomic contributions of its predictions, we find that the developed deep learning model is aware of the chemical environment of each atom by assigning reasonable atomic contributions consistent with our chemical knowledge.
在开发用于分子能量预测的机器学习方法方面,过去几年取得了重大进展,包括使用高级量子力学方法计算电子能量以及预测诸如溶剂化自由能和logP等实验性质。通常,针对不同的预测任务开发特定任务的机器学习模型。在这项工作中,我们提出了一种多任务深度集成模型sPhysNet-MT-ens5,它可以同时准确地预测分子在气相、水相和辛醇相中的电子能量,以及在计算和实验水平上的转移自由能。在本文开发的包含678,916个分子构象、最多20个重原子且其性质在B3LYP/6-31G*理论水平并采用连续介质溶剂模型计算的数据集Frag20-solv-678k上,sPhysNet-MT-ens5能够在化学精度范围内直接从力场优化的几何结构预测密度泛函理论(DFT)水平的电子能量。在实验数据集上,sPhysNet-MT-ens5取得了领先的性能,在FreeSolv数据集上预测实验水合自由能的均方根误差(RMSE)为0.620 kcal/mol,在PHYSPROP数据集上预测实验logP的RMSE为0.393。此外,sPhysNet-MT-ens5还能合理估计模型不确定性,且该不确定性与预测误差相关。最后,通过分析其预测的原子贡献,我们发现所开发的深度学习模型通过分配与化学知识一致的合理原子贡献,能够了解每个原子的化学环境。