Malshe M, Pukrittayakamee A, Raff L M, Hagan M, Bukkapatnam S, Komanduri R
Mechanical and Aerospace Engineering, Oklahoma State University, Stillwater, Oklahoma 74078, USA.
J Chem Phys. 2009 Sep 28;131(12):124127. doi: 10.1063/1.3231686.
A novel method is presented that significantly reduces the computational bottleneck of executing high-level, electronic structure calculations of the energies and their gradients for a large database that adequately samples the configuration space of importance for systems containing more than four atoms that are undergoing multiple, simultaneous reactions in several energetically open channels. The basis of the method is the high-degree of correlation that generally exists between the Hartree-Fock (HF) and higher-level electronic structure energies. It is shown that if the input vector to a neural network (NN) includes both the configuration coordinates and the HF energies of a small subset of the database, MP4(SDQ) energies with the same basis set can be predicted for the entire database using only the HF and MP4(SDQ) energies for the small subset and the HF energies for the remainder of the database. The predictive error is shown to be less than or equal to the NN fitting error if a NN is fitted to the entire database of higher-level electronic structure energies. The general method is applied to the computation of MP4(SDQ) energies of 68,308 configurations that comprise the database for the simultaneous, unimolecular decomposition of vinyl bromide into six different reaction channels. The predictive accuracy of the method is investigated by employing successively smaller subsets of the database to train the NN to predict the MP4(SDQ) energies of the remaining configurations of the database. The results indicate that for this system, the subset can be as small as 8% of the total number of configurations in the database without loss of accuracy beyond that expected if a NN is employed to fit the higher-level energies for the entire database. The utilization of this procedure is shown to save about 78% of the total computational time required for the execution of the MP4(SDQ) calculations. The sampling error involved with selection of the subset is shown to be about 10% of the predictive error for the higher-level energies. A practical procedure for utilization of the method is outlined. It is suggested that the method will be equally applicable to the prediction of electronic structure energies computed using even higher-level methods than MP4(SDQ).
本文提出了一种新方法,该方法显著减少了对大型数据库执行高级电子结构能量及其梯度计算时的计算瓶颈,该数据库充分采样了包含四个以上原子的系统在多个能量开放通道中同时进行多个反应的重要构型空间。该方法的基础是Hartree-Fock(HF)能量与更高层次电子结构能量之间普遍存在的高度相关性。结果表明,如果神经网络(NN)的输入向量既包括构型坐标,又包括数据库小子集的HF能量,那么仅使用小子集的HF和MP4(SDQ)能量以及数据库其余部分的HF能量,就可以预测整个数据库具有相同基组的MP4(SDQ)能量。如果将神经网络拟合到更高层次电子结构能量的整个数据库,预测误差将小于或等于神经网络的拟合误差。该通用方法应用于计算68308个构型的MP4(SDQ)能量,这些构型构成了溴乙烯同时单分子分解为六个不同反应通道的数据库。通过依次使用数据库中越来越小的子集来训练神经网络,以预测数据库其余构型的MP4(SDQ)能量,研究了该方法的预测准确性。结果表明,对于该系统,子集可以小至数据库中构型总数的8%,而不会损失超出使用神经网络拟合整个数据库的更高层次能量时预期的准确性。结果表明,使用该程序可节省执行MP4(SDQ)计算所需总计算时间的约78%。选择子集所涉及的采样误差约为更高层次能量预测误差的10%。概述了使用该方法的实际步骤。建议该方法同样适用于预测使用比MP4(SDQ)更高层次方法计算的电子结构能量。