Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States.
Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States.
J Chem Theory Comput. 2022 Jun 14;18(6):3566-3576. doi: 10.1021/acs.jctc.1c01111. Epub 2022 May 4.
Developing accurate classical force field representations of molecules is key to realizing the full potential of molecular simulations, both as a powerful route to gaining fundamental insights into a broad spectrum of chemical and biological phenomena and for predicting physicochemical and mechanical properties of substances. The Open Force Field Consortium is an industry-funded open science effort to this end, developing open-source tools for rapidly generating new high-quality small-molecule force fields. An integral aspect of this is the parameterization and assessment of force fields against high-quality, condensed-phase physical property data, curated from open data sources such as the NIST ThermoML Archive, alongside quantum chemical data. The quantity of such experimental data in open data archives alone would require an onerous amount of human and computational resources to both curate and estimate manually, especially when estimations must be obtained for numerous sets of force field parameters. Here, we present an entirely automated, highly scalable framework for evaluating physical properties and their gradients in terms of force field parameters. It is written as a modular and extensible Python framework, which employs an intelligent multiscale estimation approach that allows for the automated estimation of properties from simulation and cached simulation data, and a pluggable API for estimation of new properties. In this study, we demonstrate the utility of the framework by benchmarking the OpenFF 1.0.0 small-molecule force field and GAFF 1.8 and GAFF 2.1 force fields against a test set of binary density and enthalpy of mixing measurements curated using the framework utilities. Further, we demonstrate the framework's utility as part of force field optimization by using it alongside ForceBalance, a framework for systematic force field optimization, to retrain a set of nonbonded van der Waals parameters against a training set of density and enthalpy of vaporization measurements.
开发分子的精确经典力场表示是实现分子模拟全部潜力的关键,这既是深入了解广泛的化学和生物学现象的有力途径,也是预测物质物理化学和机械性质的途径。开放式力场联盟(Open Force Field Consortium)是为此目的而进行的一项由行业资助的开放科学努力,开发了用于快速生成新的高质量小分子力场的开源工具。这方面的一个组成部分是根据高质量的凝聚相物理性质数据对力场进行参数化和评估,这些数据来自 NIST ThermoML 档案等开放数据源,以及量子化学数据。仅从开放数据档案中就可以获得如此大量的实验数据,这需要大量的人力和计算资源来进行整理和手动估计,尤其是在必须为多组力场参数获得估计值时。在这里,我们提出了一个完全自动化的、高度可扩展的框架,用于根据力场参数评估物理性质及其梯度。它被编写为一个模块化和可扩展的 Python 框架,采用智能多尺度估计方法,允许从模拟和缓存的模拟数据中自动估计性质,并提供用于估计新性质的可插拔 API。在这项研究中,我们通过使用该框架的实用程序对 OpenFF 1.0.0 小分子力场和 GAFF 1.8 和 GAFF 2.1 力场进行基准测试,对使用该框架实用程序整理的二进制密度和混合焓测试集进行了评估,展示了该框架的实用性。此外,我们还通过使用它与 ForceBalance 一起作为力场优化的一部分,展示了该框架的实用性,ForceBalance 是一个用于系统力场优化的框架,用于根据一组密度和蒸发焓测量值重新训练一组非键范德华参数。