Casier Bastien, Chagas da Silva Mauricio, Badawi Michael, Pascale Fabien, Bučko Tomáš, Lebègue Sébastien, Rocca Dario
Université de Lorraine and CNRS, LPCT, UMR 7019, F-54000 Nancy, France.
Department of Physical and Theoretical Chemistry, Faculty of Natural Sciences, Comenius University in Bratislava, Bratislava, Slovakia.
J Comput Chem. 2021 Jul 30;42(20):1390-1401. doi: 10.1002/jcc.26550. Epub 2021 May 19.
Nowadays, the coupling of electronic structure and machine learning techniques serves as a powerful tool to predict chemical and physical properties of a broad range of systems. With the aim of improving the accuracy of predictions, a large number of representations for molecules and solids for machine learning applications has been developed. In this work we propose a novel descriptor based on the notion of molecular graph. While graphs are largely employed in classification problems in cheminformatics or bioinformatics, they are not often used in regression problem, especially of energy-related properties. Our method is based on a local decomposition of atomic environments and on the hybridization of two kernel functions: a graph kernel contribution that describes the chemical pattern and a Coulomb label contribution that encodes finer details of the local geometry. The accuracy of this new kernel method in energy predictions of molecular and condensed phase systems is demonstrated by considering the popular QM7 and BA10 datasets. These examples show that the hybrid localized graph kernel outperforms traditional approaches such as, for example, the smooth overlap of atomic positions and the Coulomb matrices.
如今,电子结构与机器学习技术的结合是预测各种系统化学和物理性质的有力工具。为了提高预测的准确性,已经开发了大量用于机器学习应用的分子和固体表示方法。在这项工作中,我们基于分子图的概念提出了一种新颖的描述符。虽然图在化学信息学或生物信息学的分类问题中大量使用,但它们在回归问题中并不常用,尤其是与能量相关的性质。我们的方法基于原子环境的局部分解以及两个核函数的混合:一个描述化学模式的图核贡献和一个编码局部几何更精细细节的库仑标签贡献。通过考虑流行的QM7和BA10数据集,证明了这种新的核方法在分子和凝聚相系统能量预测中的准确性。这些例子表明,混合局部图核优于传统方法,例如原子位置的平滑重叠和库仑矩阵。