Zhai Yaoguang, Caruso Alessandro, Gao Sicun, Paesani Francesco
Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA.
Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, USA.
J Chem Phys. 2020 Apr 14;152(14):144103. doi: 10.1063/5.0002162.
The efficient selection of representative configurations that are used in high-level electronic structure calculations needed for the development of many-body molecular models poses a challenge to current data-driven approaches to molecular simulations. Here, we introduce an active learning (AL) framework for generating training sets corresponding to individual many-body contributions to the energy of an N-body system, which are required for the development of MB-nrg potential energy functions (PEFs). Our AL framework is based on uncertainty and error estimation and uses Gaussian process regression to identify the most relevant configurations that are needed for an accurate representation of the energy landscape of the molecular system under examination. Taking the Cs-water system as a case study, we demonstrate that the application of our AL framework results in significantly smaller training sets than previously used in the development of the original MB-nrg PEF, without loss of accuracy. Considering the computational cost associated with high-level electronic structure calculations, our AL framework is particularly well-suited to the development of many-body PEFs, with chemical and spectroscopic accuracy, for molecular-level computer simulations from the gas to the condensed phase.
在多体分子模型开发所需的高级电子结构计算中,有效选择具有代表性的构型对当前分子模拟的数据驱动方法构成了挑战。在此,我们引入一种主动学习(AL)框架,用于生成与N体系统能量的各个多体贡献相对应的训练集,这是开发MB-nrg势能函数(PEF)所必需的。我们的AL框架基于不确定性和误差估计,并使用高斯过程回归来识别准确表示所研究分子系统能量景观所需的最相关构型。以铯-水系统为例,我们证明应用我们的AL框架会得到比最初开发MB-nrg PEF时使用的训练集显著更小的训练集,且不会损失精度。考虑到与高级电子结构计算相关的计算成本,我们的AL框架特别适合开发具有化学和光谱精度的多体PEF,用于从气相到凝聚相的分子水平计算机模拟。