Della Pia Flaviano, Shi Benjamin X, Kapil Venkat, Zen Andrea, Alfè Dario, Michaelides Angelos
Yusuf Hamied Department of Chemistry, University of Cambridge Cambridge CB2 1EW UK
Department of Physics and Astronomy, University College London London UK.
Chem Sci. 2025 May 23. doi: 10.1039/d5sc01325a.
As with many parts of the natural sciences, machine learning interatomic potentials (MLIPs) are revolutionizing the modelling of molecular crystals. However, challenges remain for the accurate and efficient calculation of sublimation enthalpies - a key thermodynamic quantity measuring the stability of a molecular crystal. Specifically, two key stumbling blocks are: (i) the need for thousands of quality reference structures to generate training data; and (ii) the sometimes unreliable nature of density functional theory, the main technique for generating such data. Exploiting recent developments in foundation models for chemistry and materials science alongside accurate quantum diffusion Monte Carlo benchmarks, offers a promising path forward. Herein, we demonstrate the generation of MLIPs capable of describing molecular crystals at finite temperature and pressure with sub-chemical accuracy, using as few as ∼200 data structures; an order of magnitude improvement over the current state-of-the-art. We apply this framework to compute the sublimation enthalpies of the X23 dataset, accounting for anharmonicity and nuclear quantum effects, achieving sub-chemical accuracy with respect to experiment. Importantly, we show that our framework can be generalized to crystals of pharmaceutical relevance, including paracetamol and aspirin. Nuclear quantum effects are also accurately captured as shown for the case of squaric acid. By enabling accurate modelling at ambient conditions, this work paves the way for deeper insights into pharmaceutical and biological systems.
与自然科学的许多领域一样,机器学习原子间势(MLIPs)正在彻底改变分子晶体的建模。然而,升华焓(一种衡量分子晶体稳定性的关键热力学量)的准确高效计算仍然面临挑战。具体而言,两个关键障碍是:(i)需要数千个高质量的参考结构来生成训练数据;(ii)密度泛函理论(生成此类数据的主要技术)有时不可靠的性质。利用化学和材料科学基础模型的最新进展以及精确的量子扩散蒙特卡罗基准,提供了一条有前景的前进道路。在此,我们展示了能够使用少至约200个数据结构以亚化学精度描述有限温度和压力下分子晶体的MLIPs的生成;比当前的最先进技术有一个数量级的改进。我们应用这个框架来计算X23数据集的升华焓,考虑非谐性和核量子效应,相对于实验实现了亚化学精度。重要的是,我们表明我们的框架可以推广到与药物相关的晶体,包括对乙酰氨基酚和阿司匹林。如方酸的情况所示,核量子效应也能被准确捕捉。通过在环境条件下实现精确建模,这项工作为深入了解药物和生物系统铺平了道路。