Suppr超能文献

利用信息论对原子机器学习中的完整性、不确定性和异常值进行无模型估计。

Model-free estimation of completeness, uncertainties, and outliers in atomistic machine learning using information theory.

作者信息

Schwalbe-Koda Daniel, Hamel Sebastien, Sadigh Babak, Zhou Fei, Lordi Vincenzo

机构信息

Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA.

Department of Materials Science and Engineering, University of California, Los Angeles, CA, 90095, USA.

出版信息

Nat Commun. 2025 Apr 29;16(1):4014. doi: 10.1038/s41467-025-59232-0.

Abstract

An accurate description of information is relevant for a range of problems in atomistic machine learning (ML), such as crafting training sets, performing uncertainty quantification (UQ), or extracting physical insights from large datasets. However, atomistic ML often relies on unsupervised learning or model predictions to analyze information contents from simulation or training data. Here, we introduce a theoretical framework that provides a rigorous, model-free tool to quantify information contents in atomistic simulations. We demonstrate that the information entropy of a distribution of atom-centered environments explains known heuristics in ML potential developments, from training set sizes to dataset optimality. Using this tool, we propose a model-free UQ method that reliably predicts epistemic uncertainty and detects out-of-distribution samples, including rare events in systems such as nucleation. This method provides a general tool for data-driven atomistic modeling and combines efforts in ML, simulations, and physical explainability.

摘要

对信息的准确描述与原子机器学习(ML)中的一系列问题相关,例如构建训练集、进行不确定性量化(UQ)或从大型数据集中提取物理见解。然而,原子ML通常依赖无监督学习或模型预测来分析来自模拟或训练数据的信息内容。在此,我们引入一个理论框架,该框架提供了一种严格的、无模型的工具来量化原子模拟中的信息内容。我们证明,以原子为中心的环境分布的信息熵解释了ML势发展中已知的启发式方法,从训练集大小到数据集最优性。使用这个工具,我们提出了一种无模型的UQ方法,该方法能够可靠地预测认知不确定性并检测分布外样本,包括成核等系统中的罕见事件。这种方法为数据驱动的原子建模提供了一个通用工具,并结合了ML、模拟和物理解释性方面的工作。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c7d/12041501/d31b426e0238/41467_2025_59232_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验