Grasselli Federico, Chong Sanggyu, Kapil Venkat, Bonfanti Silvia, Rossi Kevin
Dipartimento di Scienze Fisiche, Informatiche e Matematiche, Università degli Studi di Modena e Reggio Emilia 41125 Modena Italy
CNR NANO S3 41125 Modena Italy.
Digit Discov. 2025 Jun 9. doi: 10.1039/d5dd00102a.
The widespread adoption of machine learning surrogate models has significantly improved the scale and complexity of systems and processes that can be explored accurately and efficiently using atomistic modeling. However, the inherently data-driven nature of machine learning models introduces uncertainties that must be quantified, understood, and effectively managed to ensure reliable predictions and conclusions. Building upon these premises, in this perspective, we first overview state-of-the-art uncertainty estimation methods, from Bayesian frameworks to ensembling techniques, and discuss their application in atomistic modeling. We then examine the interplay between model accuracy, uncertainty, training dataset composition, data acquisition strategies, model transferability, and robustness. In doing so, we synthesize insights from the existing literature and highlight areas of ongoing debate.
机器学习替代模型的广泛采用显著提高了系统和流程的规模与复杂性,这些系统和流程能够通过原子模型进行准确且高效的探索。然而,机器学习模型固有的数据驱动特性引入了不确定性,必须对其进行量化、理解并有效管理,以确保可靠的预测和结论。基于这些前提,从这个角度出发,我们首先概述了从贝叶斯框架到集成技术的最新不确定性估计方法,并讨论它们在原子模型中的应用。然后,我们研究模型准确性、不确定性、训练数据集组成、数据采集策略、模型可转移性和稳健性之间的相互作用。在此过程中,我们综合了现有文献中的见解,并突出了当前存在争议的领域。