National Center for Computational Design and Discovery of Novel Materials (MARVEL), Laboratory of Computational Science and Modelling, Institute of Materials, Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland.
Phys Chem Chem Phys. 2018 Dec 5;20(47):29661-29668. doi: 10.1039/c8cp05921g.
Machine-learning of atomic-scale properties amounts to extracting correlations between structure, composition and the quantity that one wants to predict. Representing the input structure in a way that best reflects such correlations makes it possible to improve the accuracy of the model for a given amount of reference data. When using a description of the structures that is transparent and well-principled, optimizing the representation might reveal insights into the chemistry of the data set. Here we show how one can generalize the SOAP kernel to introduce a distance-dependent weight that accounts for the multi-scale nature of the interactions, and a description of correlations between chemical species. We show that this improves substantially the performance of ML models of molecular and materials stability, while making it easier to work with complex, multi-component systems and to extend SOAP to coarse-grained intermolecular potentials. The element correlations that give the best performing model show striking similarities with the conventional periodic table of the elements, providing an inspiring example of how machine learning can rediscover, and generalize, intuitive concepts that constitute the foundations of chemistry.
原子尺度性质的机器学习旨在提取结构、组成和预测数量之间的相关性。以最佳反映这些相关性的方式表示输入结构,使得在给定参考数据量的情况下,模型的准确性得到提高。当使用透明且有良好原则的结构描述时,优化表示形式可能会揭示数据集化学性质的见解。在这里,我们展示了如何将 SOAP 核推广为引入依赖距离的权重,以说明相互作用的多尺度性质以及化学物质之间的相关性描述。我们表明,这大大提高了分子和材料稳定性的 ML 模型的性能,同时使处理复杂的多组分系统变得更加容易,并将 SOAP 扩展到粗粒度的分子间势。给出性能最佳模型的元素相关性与传统的元素周期表非常相似,为机器学习如何重新发现和推广构成化学基础的直观概念提供了一个令人鼓舞的例子。