Onat Berk, Ortner Christoph, Kermode James R
Warwick Centre for Predictive Modelling, School of Engineering, University of Warwick, Coventry CV4 7AL, United Kingdom.
Warwick Mathematics Institute, University of Warwick, Coventry CV4 7AL, United Kingdom.
J Chem Phys. 2020 Oct 14;153(14):144106. doi: 10.1063/5.0016005.
Faithfully representing chemical environments is essential for describing materials and molecules with machine learning approaches. Here, we present a systematic classification of these representations and then investigate (i) the sensitivity to perturbations and (ii) the effective dimensionality of a variety of atomic environment representations and over a range of material datasets. Representations investigated include atom centered symmetry functions, Chebyshev Polynomial Symmetry Functions (CHSF), smooth overlap of atomic positions, many-body tensor representation, and atomic cluster expansion. In area (i), we show that none of the atomic environment representations are linearly stable under tangential perturbations and that for CHSF, there are instabilities for particular choices of perturbation, which we show can be removed with a slight redefinition of the representation. In area (ii), we find that most representations can be compressed significantly without loss of precision and, further, that selecting optimal subsets of a representation method improves the accuracy of regression models built for a given dataset.
忠实地表示化学环境对于用机器学习方法描述材料和分子至关重要。在这里,我们对这些表示进行了系统分类,然后研究了(i)对扰动的敏感性以及(ii)各种原子环境表示在一系列材料数据集上的有效维度。研究的表示包括以原子为中心的对称函数、切比雪夫多项式对称函数(CHSF)、原子位置的平滑重叠、多体张量表示和原子簇展开。在(i)方面,我们表明,在切向扰动下,没有一种原子环境表示是线性稳定的,对于CHSF,特定的扰动选择会存在不稳定性,我们表明通过对表示进行轻微重新定义可以消除这种不稳定性。在(ii)方面,我们发现大多数表示可以在不损失精度的情况下显著压缩,此外,选择表示方法的最佳子集可以提高为给定数据集构建的回归模型的准确性。