Sinz Paul, Swift Michael W, Brumwell Xavier, Liu Jialin, Kim Kwang Jin, Qi Yue, Hirn Matthew
Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, Michigan 48824-1226, USA.
Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, Michigan 48824-1226, USA.
J Chem Phys. 2020 Aug 28;153(8):084109. doi: 10.1063/5.0016020.
The dream of machine learning in materials science is for a model to learn the underlying physics of an atomic system, allowing it to move beyond the interpolation of the training set to the prediction of properties that were not present in the original training data. In addition to advances in machine learning architectures and training techniques, achieving this ambitious goal requires a method to convert a 3D atomic system into a feature representation that preserves rotational and translational symmetries, smoothness under small perturbations, and invariance under re-ordering. The atomic orbital wavelet scattering transform preserves these symmetries by construction and has achieved great success as a featurization method for machine learning energy prediction. Both in small molecules and in the bulk amorphous LiSi system, machine learning models using wavelet scattering coefficients as features have demonstrated a comparable accuracy to density functional theory at a small fraction of the computational cost. In this work, we test the generalizability of our LiSi energy predictor to properties that were not included in the training set, such as elastic constants and migration barriers. We demonstrate that statistical feature selection methods can reduce over-fitting and lead to remarkable accuracy in these extrapolation tasks.
材料科学中机器学习的梦想是让模型学习原子系统的潜在物理规律,使其能够超越训练集的插值,进而预测原始训练数据中不存在的属性。除了机器学习架构和训练技术的进步外,要实现这一宏伟目标,还需要一种方法将三维原子系统转换为一种特征表示,这种表示要保留旋转和平移对称性、在小扰动下的平滑性以及重新排序下的不变性。原子轨道小波散射变换通过构造保留了这些对称性,并且作为一种用于机器学习能量预测的特征化方法取得了巨大成功。无论是在小分子还是在块状非晶态LiSi系统中,使用小波散射系数作为特征的机器学习模型都已证明,在计算成本仅为一小部分的情况下,其精度可与密度泛函理论相媲美。在这项工作中,我们测试了我们的LiSi能量预测器对训练集中未包含的属性(如弹性常数和迁移势垒)的通用性。我们证明,统计特征选择方法可以减少过拟合,并在这些外推任务中带来显著的准确性。