Ng Wai-Pan, Zhang Zili, Yang Jun
Department of Chemistry, The University of Hong Kong, Hong Kong 999077, P. R. China.
Hong Kong Quantum AI Lab Limited, Hong Kong 999077, P. R. China.
J Chem Theory Comput. 2025 Feb 25;21(4):1602-1614. doi: 10.1021/acs.jctc.4c01261. Epub 2025 Feb 4.
Existing machine learning models attempt to predict the energies of large molecules by training small molecules, but eventually fail to retain high accuracy as the errors increase with system size. Through an orbital pairwise decomposition of the correlation energy, a pretrained neural network model on hundred-scale data containing small molecules is demonstrated to be sufficiently transferable for accurately predicting large systems, including molecules and crystals. Our model introduces a residual connection to explicitly learn the pairwise energy corrections, and employs various low-rank retraining techniques to modestly adjust the learned network parameters. We demonstrate that with as few as only one larger molecule retraining the base model originally trained on only small molecules of (HO), the MP2 correlation energy of the large liquid water (HO) in a periodic supercell can be predicted at chemical accuracy. Similar performance is observed for large protonated clusters and periodic poly glycine chains. A demonstrative application is presented to predict the energy ordering of symmetrically inequivalent sublattices for distinct hydrogen orientations in the ice XV phase. Our work represents an important step forward in the quest for cost-effective, highly accurate and transferable neural network models in quantum chemistry, bridging the electronic structure patterns between small and large systems.
现有的机器学习模型试图通过训练小分子来预测大分子的能量,但随着误差随系统规模增加,最终无法保持高精度。通过对相关能量进行轨道对分解,一个在包含小分子的百尺度数据上预训练的神经网络模型被证明具有足够的可迁移性,能够准确预测包括分子和晶体在内的大系统。我们的模型引入了残差连接以显式学习对能量校正,并采用各种低秩再训练技术适度调整学习到的网络参数。我们证明,仅用一个较大的分子对最初仅在(HO)小分子上训练的基础模型进行再训练,就能以化学精度预测周期性超胞中大型液态水(HO)的MP2相关能量。对于大型质子化簇和周期性聚甘氨酸链也观察到了类似的性能。展示了一个示范性应用,用于预测冰十五相中不同氢取向的对称不等价子晶格的能量排序。我们的工作代表了在量子化学中寻求经济高效、高精度和可迁移的神经网络模型方面向前迈出的重要一步,弥合了小系统和大系统之间的电子结构模式。