Fonseca Gregory, Poltavsky Igor, Tkatchenko Alexandre
Department of Physics and Materials Science, University of Luxembourg, Luxembourg City L-1511, Luxembourg.
J Chem Theory Comput. 2023 Dec 12;19(23):8706-8717. doi: 10.1021/acs.jctc.3c00985. Epub 2023 Nov 27.
As the sophistication of machine learning force fields (MLFF) increases to match the complexity of extended molecules and materials, so does the need for tools to properly analyze and assess the practical performance of MLFFs. To go beyond average error metrics and into a complete picture of a model's applicability and limitations, we developed FFAST (force field analysis software and tools): a cross-platform software package designed to gain detailed insights into a model's performance and limitations, complete with an easy-to-use graphical user interface. The software allows the user to gauge the performance of any molecular force field,─such as popular state-of-the-art MLFF models, ─ on various popular data set types, providing general prediction error overviews, outlier detection mechanisms, atom-projected errors, and more. It has a 3D visualizer to find and picture problematic configurations, atoms, or clusters in a large data set. In this paper, the example of the MACE and NequIP models is used on two data sets of interest [stachyose and docosahexaenoic acid (DHA)]─to illustrate the use cases of the software. With this, it was found that carbons and oxygens involved in or near glycosidic bonds inside the stachyose molecule present increased prediction errors. In addition, prediction errors on DHA rise as the molecule folds, especially for the carboxylic group at the edge of the molecule. We emphasize the need for a systematic assessment of MLFF models for ensuring their successful application to the study of dynamics of molecules and materials.
随着机器学习力场(MLFF)的复杂度不断提高以匹配扩展分子和材料的复杂性,对用于正确分析和评估MLFF实际性能的工具的需求也在增加。为了超越平均误差指标,全面了解模型的适用性和局限性,我们开发了FFAST(力场分析软件和工具):一个跨平台软件包,旨在深入了解模型的性能和局限性,并配有易于使用的图形用户界面。该软件允许用户评估任何分子力场的性能,例如流行的先进MLFF模型,在各种流行的数据集类型上,提供一般预测误差概述、异常值检测机制、原子投影误差等。它有一个3D可视化器,用于在大数据集中查找和描绘有问题的构型、原子或簇。在本文中,以MACE和NequIP模型为例,在两个感兴趣的数据集[水苏糖和二十二碳六烯酸(DHA)]上,来说明该软件的用例。由此发现,水苏糖分子内糖苷键中或附近的碳和氧存在较大的预测误差。此外,DHA的预测误差随着分子折叠而增加,特别是对于分子边缘的羧基。我们强调需要对MLFF模型进行系统评估,以确保它们成功应用于分子和材料动力学研究。