Bowskill D H, Tan B I, Keates A, Sugden I J, Adjiman C S, Pantelides C C
Department of Chemical Engineering, Sargent Centre for Process Systems Engineering and Institute for Molecular Science and Engineering, Imperial College London, London SW7 2AZ, U.K.
Process Studies Group, Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K.
J Chem Theory Comput. 2024 Nov 26;20(22):10288-10315. doi: 10.1021/acs.jctc.4c01091. Epub 2024 Nov 12.
Crystal structure prediction (CSP) seeks to identify all thermodynamically accessible solid forms of a given compound and, crucially, to establish the relative thermodynamic stability between different polymorphs. The conventional hierarchical CSP workflow suggests that no single energy model can fulfill the needs of all stages in the workflow, and energy models across a spectrum of fidelities and computational costs are required. Hybrid /empirical force-field (HAIEFF) models have demonstrated a good balance of these two factors, but the force-field component presents a major bottleneck for model accuracy. Existing parameter estimation tools for fitting this empirical component are inefficient and have severe limitations on the manageable problem size. This, combined with a lack of reliable reference data for parameter fitting, has resulted in development in the force-field component of HAIEFF models having mostly stagnated. In this work, we address these barriers to progress. First, we introduce a curated database of 755 organic crystal structures, obtained using high quality, solid-state DFT-D calculations, which provide a complete set of geometry and energy data. Comparisons to various theoretical and experimental data sources indicate that this database provides suitable diversity for parameter fitting. In tandem, we also put forward a new parameter estimation algorithm implemented as the CrystalEstimator program. Our tests demonstrate that CrystalEstimator is capable of efficiently handling large-scale parameter estimation problems, simultaneously fitting as many as 62 model parameters based on data from 445 structures. This problem size far exceeds any previously reported works related to CSP force-field parametrization. These developments form a strong foundation for all future work involving parameter estimation of transferable or tailor-made force-fields for HAIEFF models. This ultimately opens the way for significant improvements in the accuracy achieved by the HAIEFF models.
晶体结构预测(CSP)旨在识别给定化合物的所有热力学可及的固态形式,至关重要的是,要确定不同多晶型物之间的相对热力学稳定性。传统的分层CSP工作流程表明,没有单一的能量模型能够满足工作流程所有阶段的需求,因此需要一系列保真度和计算成本各异的能量模型。混合/经验力场(HAIEFF)模型已证明在这两个因素之间取得了良好的平衡,但力场部分是模型准确性的主要瓶颈。现有的用于拟合此经验部分的参数估计工具效率低下,并且在可处理的问题规模上有严重限制。这与缺乏用于参数拟合的可靠参考数据相结合,导致HAIEFF模型的力场部分的发展大多停滞不前。在这项工作中,我们克服了这些进展障碍。首先,我们引入了一个经过精心策划的包含755个有机晶体结构的数据库,这些结构是通过高质量的固态DFT-D计算获得的,提供了完整的几何和能量数据。与各种理论和实验数据源的比较表明,该数据库为参数拟合提供了合适的多样性。同时,我们还提出了一种新的数据估计算法,并将其实现为CrystalEstimator程序。我们的测试表明,CrystalEstimator能够有效地处理大规模参数估计问题,基于来自445个结构的数据同时拟合多达62个模型参数。这个问题规模远远超过了之前报道的与CSP力场参数化相关的任何工作。这些进展为所有未来涉及HAIEFF模型的可转移或定制力场参数估计的工作奠定了坚实的基础。这最终为显著提高HAIEFF模型的准确性开辟了道路。