Division of Applied Mathematics, Brown University, Providence, Rhode Island 02912, USA.
J Chem Phys. 2018 Jan 21;148(3):034101. doi: 10.1063/1.5008630.
Molecular fingerprints, i.e., feature vectors describing atomistic neighborhood configurations, is an important abstraction and a key ingredient for data-driven modeling of potential energy surface and interatomic force. In this paper, we present the density-encoded canonically aligned fingerprint algorithm, which is robust and efficient, for fitting per-atom scalar and vector quantities. The fingerprint is essentially a continuous density field formed through the superimposition of smoothing kernels centered on the atoms. Rotational invariance of the fingerprint is achieved by aligning, for each fingerprint instance, the neighboring atoms onto a local canonical coordinate frame computed from a kernel minisum optimization procedure. We show that this approach is superior over principal components analysis-based methods especially when the atomistic neighborhood is sparse and/or contains symmetry. We propose that the "distance" between the density fields be measured using a volume integral of their pointwise difference. This can be efficiently computed using optimal quadrature rules, which only require discrete sampling at a small number of grid points. We also experiment on the choice of weight functions for constructing the density fields and characterize their performance for fitting interatomic potentials. The applicability of the fingerprint is demonstrated through a set of benchmark problems.
分子指纹,即描述原子近邻构型的特征向量,是数据驱动的势能面和原子间力建模的重要抽象和关键组成部分。在本文中,我们提出了一种稳健高效的密度编码正则化指纹算法,用于拟合原子的标量和向量量。指纹实质上是通过在原子中心叠加平滑核形成的连续密度场。通过为每个指纹实例对齐邻近原子到从核最小和优化过程计算的局部正则坐标框架,实现了指纹的旋转不变性。我们表明,这种方法优于基于主成分分析的方法,特别是在原子近邻稀疏和/或包含对称性时。我们建议使用它们逐点差的体积积分来测量密度场之间的“距离”。这可以使用仅在少数网格点进行离散采样的最优求积规则高效计算。我们还对构建密度场的权函数进行了实验,并对其用于拟合原子间势能的性能进行了特征化。通过一组基准问题证明了指纹的适用性。