Centre for Quantum Materials and Technologies, School of Mathematics and Physics, Queen's University Belfast, Belfast BT7 1NN, U.K.
Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K.
J Chem Inf Model. 2024 Jun 10;64(11):4426-4435. doi: 10.1021/acs.jcim.4c00421. Epub 2024 May 28.
The polarization of periodically repeating systems is a discontinuous function of the atomic positions, a fact which seems at first to stymie attempts at their statistical learning. Two approaches to build models for bulk polarizations are compared: one in which a simple point charge model is used to preprocess the raw polarization to give a learning target that is a smooth function of atomic positions and the total polarization is learned as a sum of atom-centered dipoles and one in which instead the average position of Wannier centers around atoms is predicted. For a range of bulk aqueous systems, both of these methods perform perform comparatively well, with the former being slightly better but often requiring an extra effort to find a suitable point charge model. As a challenging test, we also analyze the performance of the models at the air-water interface. In this case, while the Wannier center approach delivers accurate predictions without further modifications, the preprocessing method requires augmentation with information from isolated water molecules to reach similar accuracy. Finally, we present a simple protocol to preprocess the polarizations in a data-driven way using a small number of derivatives calculated at a much lower level of theory, thus overcoming the need to find point charge models without appreciably increasing the computation cost. We believe that the training strategies presented here help the construction of accurate polarization models required for the study of the dielectric properties of realistic complex bulk systems and interfaces with ab initio accuracy.
周期性重复系统的极化是原子位置的不连续函数,这一事实似乎首先阻碍了对其进行统计学习的尝试。我们比较了两种构建体相极化模型的方法:一种方法是使用简单的点电荷模型预处理原始极化,得到一个学习目标,该目标是原子位置的平滑函数,总极化被学习为原子中心偶极子的和;另一种方法是预测原子周围的 Wannier 中心的平均位置。对于一系列体相水溶液体系,这两种方法的性能都相当好,前者稍好一些,但通常需要额外的努力来找到合适的点电荷模型。作为一个具有挑战性的测试,我们还分析了模型在气-液界面上的性能。在这种情况下,虽然 Wannier 中心方法无需进一步修改即可提供准确的预测,但预处理方法需要使用来自孤立水分子的信息进行扩充,才能达到类似的精度。最后,我们提出了一种简单的协议,使用在低水平理论上计算的少数几个导数以数据驱动的方式预处理极化,从而避免了寻找点电荷模型的需要,而不会显著增加计算成本。我们相信,这里提出的训练策略有助于构建具有原子精度的极化模型,这是研究具有实际复杂体相系统和界面介电性质的必要条件。