Lindsey Rebecca K, Fried Laurence E, Goldman Nir, Bastea Sorin
Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California 94550, USA.
J Chem Phys. 2020 Oct 7;153(13):134117. doi: 10.1063/5.0021965.
Machine learned reactive force fields based on polynomial expansions have been shown to be highly effective for describing simulations involving reactive materials. Nevertheless, the highly flexible nature of these models can give rise to a large number of candidate parameters for complicated systems. In these cases, reliable parameterization requires a well-formed training set, which can be difficult to achieve through standard iterative fitting methods. Here, we present an active learning approach based on cluster analysis and inspired by Shannon information theory to enable semi-automated generation of informative training sets and robust machine learned force fields. The use of this tool is demonstrated for development of a model based on linear combinations of Chebyshev polynomials explicitly describing up to four-body interactions, for a chemically and structurally diverse system of C/O under extreme conditions. We show that this flexible training database management approach enables development of models exhibiting excellent agreement with Kohn-Sham density functional theory in terms of structure, dynamics, and speciation.
基于多项式展开的机器学习反应力场已被证明在描述涉及反应性材料的模拟方面非常有效。然而,这些模型的高度灵活性可能会为复杂系统产生大量候选参数。在这些情况下,可靠的参数化需要一个结构良好的训练集,而通过标准的迭代拟合方法很难实现这一点。在这里,我们提出了一种基于聚类分析并受香农信息论启发的主动学习方法,以实现信息丰富的训练集的半自动生成和强大的机器学习力场。该工具的使用在开发一个基于切比雪夫多项式线性组合的模型中得到了证明,该模型明确描述了多达四体相互作用,用于极端条件下化学和结构多样的C/O系统。我们表明,这种灵活的训练数据库管理方法能够开发出在结构、动力学和物种形成方面与Kohn-Sham密度泛函理论表现出极佳一致性的模型。