Musil Félix, Veit Max, Goscinski Alexander, Fraux Guillaume, Willatt Michael J, Stricker Markus, Junge Till, Ceriotti Michele
Laboratory of Computational Science and Modeling, Institute of Materials, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland.
Laboratory for Multiscale Mechanics Modeling, Institute of Mechanical Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland.
J Chem Phys. 2021 Mar 21;154(11):114109. doi: 10.1063/5.0044689.
Physically motivated and mathematically robust atom-centered representations of molecular structures are key to the success of modern atomistic machine learning. They lie at the foundation of a wide range of methods to predict the properties of both materials and molecules and to explore and visualize their chemical structures and compositions. Recently, it has become clear that many of the most effective representations share a fundamental formal connection. They can all be expressed as a discretization of n-body correlation functions of the local atom density, suggesting the opportunity of standardizing and, more importantly, optimizing their evaluation. We present an implementation, named librascal, whose modular design lends itself both to developing refinements to the density-based formalism and to rapid prototyping for new developments of rotationally equivariant atomistic representations. As an example, we discuss smooth overlap of atomic position (SOAP) features, perhaps the most widely used member of this family of representations, to show how the expansion of the local density can be optimized for any choice of radial basis sets. We discuss the representation in the context of a kernel ridge regression model, commonly used with SOAP features, and analyze how the computational effort scales for each of the individual steps of the calculation. By applying data reduction techniques in feature space, we show how to reduce the total computational cost by a factor of up to 4 without affecting the model's symmetry properties and without significantly impacting its accuracy.
基于物理动机且数学上稳健的分子结构原子中心表示法是现代原子机器学习成功的关键。它们是众多预测材料和分子性质、探索和可视化其化学结构与组成方法的基础。最近,很明显许多最有效的表示法都有一个基本的形式联系。它们都可以表示为局部原子密度的n体相关函数的离散化,这意味着有机会对其评估进行标准化,更重要的是进行优化。我们展示了一个名为librascal的实现,其模块化设计既有助于对基于密度的形式主义进行改进,也有助于为旋转等变原子表示的新发展进行快速原型设计。作为一个例子,我们讨论原子位置的平滑重叠(SOAP)特征,它可能是这类表示法中使用最广泛的成员,以展示如何针对任何径向基集的选择优化局部密度的展开。我们在与SOAP特征常用的核岭回归模型的背景下讨论这种表示法,并分析计算的每个单独步骤的计算量如何缩放。通过在特征空间中应用数据约简技术,我们展示了如何在不影响模型对称性且不显著影响其准确性的情况下,将总计算成本降低多达4倍。