Institute of Materials Chemistry, TU Wien, A-1060 Vienna, Austria.
J Chem Inf Model. 2024 Aug 26;64(16):6377-6387. doi: 10.1021/acs.jcim.4c00904. Epub 2024 Aug 7.
Machine learning potentials have become an essential tool for atomistic simulations, yielding results close to ab initio simulations at a fraction of computational cost. With recent improvements on the achievable accuracies, the focus has now shifted on the data set composition itself. The reliable identification of erroneously predicted configurations to extend a given data set is therefore of high priority. Yet, uncertainty estimation techniques have achieved mixed results for machine learning potentials. Consequently, a general and versatile method to correlate energy or atomic force uncertainties with the model error has remained elusive to date. In the current work, we show that epistemic uncertainty cannot correlate with model error by definition but can be aggregated over groups of atoms to yield a strong correlation. We demonstrate that our method correctly estimates prediction errors both globally per structure and locally resolved per atom. The direct correlation of local uncertainty and local error is used to design an active learning framework based on identifying local subregions of a large simulation cell and performing ab initio calculations only for the subregion subsequently. We successfully utilized this method to perform active learning in the low-data regime for liquid water.
机器学习潜力已成为原子模拟的重要工具,其计算成本仅为从头算模拟的一小部分,却能得到接近从头算模拟的结果。随着最近在可实现精度方面的改进,现在的重点已经转移到数据集的组成本身。因此,可靠地识别错误预测的构型以扩展给定数据集是当务之急。然而,不确定性估计技术在机器学习势方面的结果喜忧参半。因此,到目前为止,仍然难以找到一种通用且多功能的方法来将能量或原子力不确定性与模型误差相关联。在当前的工作中,我们表明,根据定义,认识不确定性不能与模型误差相关联,但可以在原子组上进行聚合,从而产生很强的相关性。我们证明了我们的方法可以正确地估计全局结构的预测误差和局部原子的预测误差。局部不确定性和局部误差的直接相关性用于设计基于识别大模拟单元的局部子区域的主动学习框架,并仅对随后的子区域执行从头算计算。我们成功地将该方法用于液体水的低数据环境中的主动学习。