Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA.
Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM, USA.
Nat Comput Sci. 2023 Mar;3(3):230-239. doi: 10.1038/s43588-023-00406-5. Epub 2023 Mar 6.
Machine learning (ML) models, if trained to data sets of high-fidelity quantum simulations, produce accurate and efficient interatomic potentials. Active learning (AL) is a powerful tool to iteratively generate diverse data sets. In this approach, the ML model provides an uncertainty estimate along with its prediction for each new atomic configuration. If the uncertainty estimate passes a certain threshold, then the configuration is included in the data set. Here we develop a strategy to more rapidly discover configurations that meaningfully augment the training data set. The approach, uncertainty-driven dynamics for active learning (UDD-AL), modifies the potential energy surface used in molecular dynamics simulations to favor regions of configuration space for which there is large model uncertainty. The performance of UDD-AL is demonstrated for two AL tasks: sampling the conformational space of glycine and sampling the promotion of proton transfer in acetylacetone. The method is shown to efficiently explore the chemically relevant configuration space, which may be inaccessible using regular dynamical sampling at target temperature conditions.
机器学习 (ML) 模型,如果针对高保真量子模拟数据集进行训练,就能生成精确且高效的原子间势。主动学习 (AL) 是一种强大的工具,可以迭代地生成多样化的数据集。在这种方法中,机器学习模型会为每个新的原子构型提供不确定性估计及其预测。如果不确定性估计通过某个阈值,则将该构型包含在数据集中。在这里,我们开发了一种策略,用于更快速地发现有意义地增强训练数据集的构型。该方法,即主动学习的不确定性驱动动力学 (UDD-AL),会修改分子动力学模拟中使用的势能面,以有利于模型不确定性较大的构型空间区域。UDD-AL 在两个 AL 任务中的性能得到了证明:甘氨酸构象空间采样和乙酰丙酮质子转移促进采样。该方法被证明可以有效地探索化学相关的构型空间,而在目标温度条件下使用常规动力学采样可能无法访问这些空间。