Suman Divya, Nigam Jigyasa, Saade Sandra, Pegolo Paolo, Türk Hanna, Zhang Xing, Chan Garnet Kin-Lic, Ceriotti Michele
Laboratory of Computational Science and Modeling, Institut des Matériaux, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland.
Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States.
J Chem Theory Comput. 2025 Jul 8;21(13):6505-6516. doi: 10.1021/acs.jctc.5c00522. Epub 2025 Jun 25.
Traditional atomistic machine learning (ML) models serve as surrogates for quantum mechanical (QM) properties, predicting quantities such as dipole moments and polarizabilities directly from compositions and geometries of atomic configurations. With the emergence of ML approaches to predict the "ingredients" of a QM calculation, such as the ground-state charge density or the effective single-particle Hamiltonian, it has become possible to obtain multiple properties through analytical physics-based operations on these intermediate ML predictions. We present a framework that seamlessly integrates the prediction of an effective electronic Hamiltonian, for both molecular and condensed-phase systems, with PySCFAD, a differentiable QM workflow. This integration facilitates training models indirectly against functions of the Hamiltonian, such as electronic energy levels, dipole moments, polarizability, etc. We then use this framework to explore various possible choices within the design space of hybrid ML/QM models, examining the influence of incorporating multiple targets on model performance and learning a reduced-basis ML Hamiltonian that can reproduce targets computed on a much larger basis. Our benchmarks evaluate the accuracy and transferability of these hybrid models, compare them against predictions of atomic properties from their surrogate models, and provide indications to guide the design of the interface between the ML and QM components of the model.
传统的原子机器学习(ML)模型作为量子力学(QM)性质的替代物,直接从原子构型的组成和几何结构预测诸如偶极矩和极化率等物理量。随着用于预测QM计算“成分”(如基态电荷密度或有效单粒子哈密顿量)的ML方法的出现,通过对这些中间ML预测进行基于解析物理的操作来获得多种性质成为可能。我们提出了一个框架,该框架将分子和凝聚相系统的有效电子哈密顿量预测与可微QM工作流程PySCFAD无缝集成。这种集成便于针对哈密顿量的函数(如电子能级、偶极矩、极化率等)间接训练模型。然后,我们使用这个框架在混合ML/QM模型的设计空间中探索各种可能的选择,研究纳入多个目标对模型性能的影响,并学习一个能够重现基于更大基组计算的目标的简化基ML哈密顿量。我们的基准测试评估了这些混合模型的准确性和可转移性,将它们与替代模型对原子性质的预测进行比较,并为指导模型的ML和QM组件之间的接口设计提供参考。