Lu Jianing, Wang Cheng, Zhang Yingkai
Department of Chemistry , New York University , New York , New York 10003 , United States.
NYU-ECNU Center for Computational Chemistry , New York University Shanghai , Shanghai 200062 , P. R. China.
J Chem Theory Comput. 2019 Jul 9;15(7):4113-4121. doi: 10.1021/acs.jctc.9b00001. Epub 2019 Jun 12.
The use of neural networks to predict molecular properties calculated from high level quantum mechanical calculations has made significant advances in recent years, but most models need input geometries from DFT optimizations which limit their applicability in practice. In this work, we explored how machine learning can be used to predict molecular atomization energies and conformation stability using optimized geometries from Merck Molecular Force Field (MMFF). On the basis of the recently introduced deep tensor neural network (DTNN) approach, we first improved its training efficiency and performed an extensive search of its hyperparameters, and developed a DTNN_7ib model which has a test accuracy of 0.34 kcal/mol mean absolute error (MAE) on QM9 data set. Then using atomic vector representations in the DTNN_7ib model, we employed transfer learning (TL) strategy to train readout layers on the QM9 data set, in which QM properties are the same as in QM9 [calculated at the B3LYP/6-31G(2df,p) level] while molecular geometries are corresponding local minima optimized with MMFF94 force field. The developed TL_QM9 model can achieve an MAE of 0.79 kcal/mol using MMFF optimized geometries. Furthermore, we demonstrated that the same transfer learning strategy with the same atomic vector representation can be used to develop a machine learning model that can achieve an MAE of 0.51 kcal/mol in molecular energy prediction using MMFF geometries for an eMol9_C conformation data set, which consists of 9959 molecules and 88 234 conformations with energies calculated at the B3LYP/6-31G* level. Our results indicate that DFT-level accuracy of molecular energy prediction can be achieved using force-field optimized geometries and atomic vector representations learned from deep tensor neural network, and integrated molecular modeling and machine learning would be a promising approach to develop more powerful computational tools for molecular conformation analysis.
近年来,利用神经网络预测通过高水平量子力学计算得出的分子性质取得了显著进展,但大多数模型需要来自密度泛函理论(DFT)优化的输入几何结构,这限制了它们在实际中的适用性。在这项工作中,我们探索了如何使用机器学习,利用默克分子力场(MMFF)优化的几何结构来预测分子原子化能和构象稳定性。基于最近引入的深度张量神经网络(DTNN)方法,我们首先提高了其训练效率并对其超参数进行了广泛搜索,开发了一个DTNN_7ib模型,该模型在QM9数据集上的测试准确率为平均绝对误差(MAE)0.34千卡/摩尔。然后,在DTNN_7ib模型中使用原子向量表示,我们采用迁移学习(TL)策略在QM9数据集上训练读出层,其中QM性质与QM9中的相同[在B3LYP/6 - 31G(2df,p)水平计算],而分子几何结构是用MMFF94力场优化得到的相应局部最小值。所开发的TL_QM9模型使用MMFF优化的几何结构可实现MAE为0.79千卡/摩尔。此外,我们证明了使用相同的原子向量表示和相同的迁移学习策略,可以开发一个机器学习模型,对于由9959个分子和88234个构象组成的eMol9_C构象数据集,使用MMFF几何结构进行分子能量预测时,该模型可实现MAE为0.51千卡/摩尔,其能量在B3LYP/6 - 31G*水平计算。我们的结果表明,使用力场优化的几何结构和从深度张量神经网络学习的原子向量表示,可以实现分子能量预测的DFT水平精度,并且整合分子建模和机器学习将是开发更强大的分子构象分析计算工具的一种有前途的方法。