De Sandip, Bartók Albert P, Csányi Gábor, Ceriotti Michele
National Center for Computational Design and Discovery of Novel Materials (MARVEL), Switzerland and Laboratory of Computational Science and Modelling, Institute of Materials, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
Engineering Laboratory, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, UK.
Phys Chem Chem Phys. 2016 May 18;18(20):13754-69. doi: 10.1039/c6cp00415f.
Evaluating the (dis)similarity of crystalline, disordered and molecular compounds is a critical step in the development of algorithms to navigate automatically the configuration space of complex materials. For instance, a structural similarity metric is crucial for classifying structures, searching chemical space for better compounds and materials, and driving the next generation of machine-learning techniques for predicting the stability and properties of molecules and materials. In the last few years several strategies have been designed to compare atomic coordination environments. In particular, the smooth overlap of atomic positions (SOAPs) has emerged as an elegant framework to obtain translation, rotation and permutation-invariant descriptors of groups of atoms, underlying the development of various classes of machine-learned inter-atomic potentials. Here we discuss how one can combine such local descriptors using a regularized entropy match (REMatch) approach to describe the similarity of both whole molecular and bulk periodic structures, introducing powerful metrics that enable the navigation of alchemical and structural complexities within a unified framework. Furthermore, using this kernel and a ridge regression method we can predict atomization energies for a database of small organic molecules with a mean absolute error below 1 kcal mol(-1), reaching an important milestone in the application of machine-learning techniques for the evaluation of molecular properties.
评估晶体、无序和分子化合物的(不)相似性是开发自动探索复杂材料构型空间算法的关键步骤。例如,结构相似性度量对于结构分类、在化学空间中搜索更好的化合物和材料以及推动用于预测分子和材料稳定性及性质的下一代机器学习技术至关重要。在过去几年中,已经设计了几种策略来比较原子配位环境。特别是,原子位置的平滑重叠(SOAPs)已成为一个优雅的框架,用于获得原子组的平移、旋转和置换不变描述符,这是各类机器学习原子间势发展的基础。在这里,我们讨论如何使用正则化熵匹配(REMatch)方法组合此类局部描述符,以描述整个分子和体相周期性结构的相似性,引入强大的度量,从而在统一框架内实现对炼金术和结构复杂性的探索。此外,使用此核和岭回归方法,我们可以预测一个小型有机分子数据库的原子化能,平均绝对误差低于1 kcal mol⁻¹,这在应用机器学习技术评估分子性质方面达到了一个重要里程碑。