Willatt Michael J, Musil Félix, Ceriotti Michele
Laboratory of Computational Science and Modeling, Institute of Materials, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland.
J Chem Phys. 2019 Apr 21;150(15):154110. doi: 10.1063/1.5090481.
The applications of machine learning techniques to chemistry and materials science become more numerous by the day. The main challenge is to devise representations of atomic systems that are at the same time complete and concise, so as to reduce the number of reference calculations that are needed to predict the properties of different types of materials reliably. This has led to a proliferation of alternative ways to convert an atomic structure into an input for a machine-learning model. We introduce an abstract definition of chemical environments that is based on a smoothed atomic density, using a bra-ket notation to emphasize basis set independence and to highlight the connections with some popular choices of representations for describing atomic systems. The correlations between the spatial distribution of atoms and their chemical identities are computed as inner products between these feature kets, which can be given an explicit representation in terms of the expansion of the atom density on orthogonal basis functions, that is equivalent to the smooth overlap of atomic positions power spectrum, but also in real space, corresponding to n-body correlations of the atom density. This formalism lays the foundations for a more systematic tuning of the behavior of the representations, by introducing operators that represent the correlations between structure, composition, and the target properties. It provides a unifying picture of recent developments in the field and indicates a way forward toward more effective and computationally affordable machine-learning schemes for molecules and materials.
机器学习技术在化学和材料科学中的应用日益增多。主要挑战在于设计出既完整又简洁的原子系统表示方法,以减少可靠预测不同类型材料性质所需的参考计算数量。这导致了将原子结构转换为机器学习模型输入的替代方法激增。我们引入了一种基于平滑原子密度的化学环境抽象定义,使用狄拉克符号来强调基组独立性,并突出与一些描述原子系统的流行表示选择之间的联系。原子空间分布与其化学身份之间的相关性通过这些特征矢之间的内积来计算,这些内积可以根据原子密度在正交基函数上的展开给出显式表示,这等同于原子位置功率谱的平滑重叠,但也在实空间中,对应于原子密度的多体相关性。这种形式主义通过引入表示结构、组成和目标性质之间相关性的算符,为更系统地调整表示行为奠定了基础。它提供了该领域近期发展的统一图景,并指出了朝着更有效且计算成本可承受的分子和材料机器学习方案前进的方向。