Combustion Research Facility, Sandia National Laboratories, Livermore, California 94551, United States.
Center for Computing Research, Sandia National Laboratories, Albuquerque, New Mexico 87185, United States.
J Chem Inf Model. 2023 Apr 24;63(8):2281-2295. doi: 10.1021/acs.jcim.2c01617. Epub 2023 Apr 12.
This paper focuses on the development of multifidelity modeling approaches using neural network surrogates, where training data arising from multiple model forms and resolutions are integrated to predict high-fidelity response quantities of interest at lower cost. We focus on the context of quantum chemistry and the integration of information from multiple levels of theory. Important foundations include the use of symmetry function-based atomic energy vector constructions as feature vectors for representing structures across families of molecules and single-fidelity neural network training capabilities that learn the relationships needed to map feature vectors to potential energy predictions. These foundations are embedded within several multifidelity topologies that decompose the high-fidelity mapping into model-based components, including sequential formulations that admit a general nonlinear mapping across fidelities and discrepancy-based formulations that presume an additive decomposition. Methodologies are first explored and demonstrated on a pair of simple analytical test problems and then deployed for potential energy prediction for CH using B2PLYP-D3/6-311++G(d,p) for high-fidelity simulation data and Hartree-Fock 6-31G for low-fidelity data. For the common case of limited access to high-fidelity data, our computational results demonstrate that multifidelity neural network potential energy surface constructions achieve roughly an order of magnitude improvement, either in terms of test error reduction for equivalent total simulation cost or reduction in total cost for equivalent error.
本文专注于使用神经网络代理开发多保真建模方法,其中整合了来自多种模型形式和分辨率的训练数据,以更低的成本预测高保真感兴趣的响应量。我们专注于量子化学的背景和来自多个理论层次的信息的整合。重要的基础包括使用基于对称函数的原子能量向量构造作为特征向量,以表示分子家族和单保真神经网络训练能力中的结构,这些能力学习将特征向量映射到势能预测所需的关系。这些基础嵌入在几种多保真拓扑结构中,将高保真映射分解为基于模型的组件,包括允许跨保真度的一般非线性映射的顺序公式和假定附加分解的差异公式。方法首先在一对简单的分析测试问题上进行了探索和演示,然后部署用于 CH 的势能预测,对于高保真模拟数据使用 B2PLYP-D3/6-311++G(d,p),对于低保真数据使用 Hartree-Fock 6-31G。对于访问高保真数据有限的常见情况,我们的计算结果表明,多保真神经网络势能表面构造在等效总模拟成本的测试误差减少或等效误差的总成本减少方面,都取得了大约一个数量级的改进。