Li Ying, Li Hui, Pickard Frank C, Narayanan Badri, Sen Fatih G, Chan Maria K Y, Sankaranarayanan Subramanian K R S, Brooks Bernard R, Roux Benoît
Argonne Leadership Computing Facility, Argonne National Laboratory , Argonne, Illinois 60439, United States.
Department of Biochemistry and Molecular Biophysics, University of Chicago , Chicago, Illinois 60637, United States.
J Chem Theory Comput. 2017 Sep 12;13(9):4492-4503. doi: 10.1021/acs.jctc.7b00521. Epub 2017 Sep 1.
Machine learning (ML) techniques with the genetic algorithm (GA) have been applied to determine a polarizable force field parameters using only ab initio data from quantum mechanics (QM) calculations of molecular clusters at the MP2/6-31G(d,p), DFMP2(fc)/jul-cc-pVDZ, and DFMP2(fc)/jul-cc-pVTZ levels to predict experimental condensed phase properties (i.e., density and heat of vaporization). The performance of this ML/GA approach is demonstrated on 4943 dimer electrostatic potentials and 1250 cluster interaction energies for methanol. Excellent agreement between the training data set from QM calculations and the optimized force field model was achieved. The results were further improved by introducing an offset factor during the machine learning process to compensate for the discrepancy between the QM calculated energy and the energy reproduced by optimized force field, while maintaining the local "shape" of the QM energy surface. Throughout the machine learning process, experimental observables were not involved in the objective function, but were only used for model validation. The best model, optimized from the QM data at the DFMP2(fc)/jul-cc-pVTZ level, appears to perform even better than the original AMOEBA force field (amoeba09.prm), which was optimized empirically to match liquid properties. The present effort shows the possibility of using machine learning techniques to develop descriptive polarizable force field using only QM data. The ML/GA strategy to optimize force fields parameters described here could easily be extended to other molecular systems.
机器学习(ML)技术与遗传算法(GA)已被应用于仅使用来自分子簇在MP2/6-31G(d,p)、DFMP2(fc)/jul-cc-pVDZ和DFMP2(fc)/jul-cc-pVTZ水平的量子力学(QM)计算的从头算数据来确定可极化力场参数,以预测实验凝聚相性质(即密度和汽化热)。这种ML/GA方法的性能在4943个甲醇二聚体静电势和1250个甲醇团簇相互作用能上得到了验证。在QM计算的训练数据集和优化的力场模型之间实现了极好的一致性。通过在机器学习过程中引入一个偏移因子来补偿QM计算能量与优化力场再现能量之间的差异,同时保持QM能量表面的局部“形状”,结果得到了进一步改善。在整个机器学习过程中,实验可观测量未涉及目标函数,仅用于模型验证。从DFMP2(fc)/jul-cc-pVTZ水平的QM数据优化得到的最佳模型,其表现似乎甚至优于最初为匹配液体性质而凭经验优化的AMOEBA力场(amoeba09.prm)。目前的工作表明了仅使用QM数据利用机器学习技术开发描述性可极化力场的可能性。这里描述的用于优化力场参数的ML/GA策略可以很容易地扩展到其他分子系统。