Lei Yao-Kun, Yagi Kiyoshi, Sugita Yuji
Theoretical Molecular Science Laboratory, RIKEN Cluster for Pioneering Research, Wako, Saitama 351-0198, Japan.
Computational Biophysics Research Team, RIKEN Center for Computational Science, Kobe, Hyogo 650-0047, Japan.
J Chem Theory Comput. 2025 Mar 11;21(5):2695-2711. doi: 10.1021/acs.jctc.4c01393. Epub 2025 Mar 2.
Machine learning (ML) methods have emerged as an efficient surrogate for high-level electronic structure theory, offering precision and computational efficiency. However, the vast conformational and chemical space remains challenging when constructing a general force field. Training data sets typically cover only a limited region of this space, resulting in poor extrapolation performance. Traditional strategies must address this problem by training models from scratch using old and new data sets. In addition, model transferability is crucial for general force field construction. Existing ML force fields, designed for closed systems with no external environmental potential, exhibit limited transferability to complex condensed phase systems such as enzymatic reactions, resulting in inferior performance and high memory costs. Our ML/MM model, based on the Taylor expansion of the electrostatic operator, showed high transferability between reactions in several simple solvents. This work extends the strategy to enzymatic reactions to explore the transferability between more complex heterogeneous environments. In addition, we also apply continual learning strategies based on memory data sets to enable autonomous and on-the-fly training on a continuous stream of new data. By combining these two methods, we can efficiently construct a force field that can be applied to chemical reactions in various environmental media.
机器学习(ML)方法已成为高级电子结构理论的一种有效替代方法,兼具精度和计算效率。然而,在构建通用力场时,巨大的构象和化学空间仍然具有挑战性。训练数据集通常只覆盖该空间的有限区域,导致外推性能较差。传统策略必须通过使用新旧数据集从头开始训练模型来解决这个问题。此外,模型可转移性对于通用力场构建至关重要。现有的ML力场是为没有外部环境势的封闭系统设计的,对诸如酶促反应等复杂凝聚相系统的可转移性有限,导致性能较差且内存成本高。我们基于静电算子泰勒展开的ML/MM模型在几种简单溶剂中的反应之间显示出高可转移性。这项工作将该策略扩展到酶促反应,以探索更复杂的非均相环境之间的可转移性。此外,我们还应用基于记忆数据集的持续学习策略,以便能够对连续的新数据流进行自主和即时训练。通过结合这两种方法,我们可以有效地构建一个可应用于各种环境介质中化学反应的力场。