Lee Minhyeok, Ucak Umit V, Jeong Jinyoung, Ashyrmamatov Islambek, Lee Juyong, Sim Eunji
Department of Chemistry, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul, 03722, Republic of Korea.
Research Institute of Pharmaceutical Science, College of Pharmacy, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea.
Adv Sci (Weinh). 2025 Mar;12(9):e2409009. doi: 10.1002/advs.202409009. Epub 2025 Jan 13.
Machine learning interatomic potentials (MLIPs) promise quantum-level accuracy at classical force field speeds, but their performance hinges on the quality and diversity of training data. An efficient and fully automated approach to sample chemical reaction space without relying on human intuition, addressing a critical gap in MLIP development is presented. The method combines the speed of tight-binding calculations with selective high-level refinement, generating diverse datasets that capture both equilibrium and reactive regions of potential energy surfaces. By employing single-ended growing string and nudged elastic band methods, reaction pathways previously underrepresented in MLIP training sets, particularly near transition states are systematically explored. This approach yields datasets with rich structural and chemical diversity, essential for robust MLIP development. Open-source code is provided for the entire workflow, facilitating the integration of the approach into existing MLIP development pipelines.
机器学习原子间势(MLIPs)有望在经典力场速度下实现量子级精度,但其性能取决于训练数据的质量和多样性。本文提出了一种高效且完全自动化的方法,无需依赖人类直觉即可对化学反应空间进行采样,解决了MLIP开发中的一个关键差距。该方法将紧束缚计算的速度与选择性的高级优化相结合,生成了能够捕获势能面平衡和反应区域的多样化数据集。通过采用单端生长字符串和推挤弹性带方法,系统地探索了MLIP训练集中以前代表性不足的反应路径,特别是在过渡态附近。这种方法产生了具有丰富结构和化学多样性的数据集,这对于稳健的MLIP开发至关重要。提供了整个工作流程的开源代码,便于将该方法集成到现有的MLIP开发管道中。