Taylor Christopher R, Butler Patrick W V, Day Graeme M
School of Chemistry, University of Southampton, Southampton, SO17 1BJ, UK.
Faraday Discuss. 2025 Jan 14;256(0):434-458. doi: 10.1039/d4fd00105b.
Computational crystal structure prediction (CSP) is an increasingly powerful technique in materials discovery, due to its ability to reveal trends and permit insight across the possibility space of crystal structures of a candidate molecule, beyond simply the observed structure(s). In this work, we demonstrate the reliability and scalability of CSP methods for small, rigid organic molecules by performing in-depth CSP investigations for over 1000 such compounds, the largest survey of its kind to-date. We show that this highly-efficient force-field-based CSP approach is superbly predictive, locating 99.4% of observed experimental structures, and ranking a large majority of these (74%) as among the most stable possible structures (to within uncertainty due to thermal effects). We present two examples of insights such large predicted datasets can permit, examining the space group preferences of organic molecular crystals and rationalising empirical rules concerning the spontaneous resolution of chiral molecules. Finally, we exploit this large and diverse dataset for developing transferable machine-learned energy potentials for the organic solid state, training a neural network lattice energy correction to force field energies that offers substantial improvements to the already impressive energy rankings, and a MACE equivariant message-passing neural network for crystal structure re-optimisation. We conclude that the excellent performance and reliability of the CSP workflow enables the creation of very large datasets of broad utility and explanatory power in materials design.
计算晶体结构预测(CSP)在材料发现领域是一种日益强大的技术,因为它不仅能揭示晶体结构的趋势,还能让人们洞察候选分子晶体结构可能性空间中的情况,而不仅仅局限于已观察到的结构。在这项工作中,我们通过对1000多种此类小分子刚性有机化合物进行深入的CSP研究,展示了CSP方法对于这类化合物的可靠性和可扩展性,这是迄今为止同类研究中规模最大的一次。我们表明,这种基于高效力场的CSP方法具有出色的预测能力,能定位到99.4%已观察到的实验结构,并将其中大部分(74%)列为最稳定的可能结构(在热效应导致的不确定性范围内)。我们给出了两个例子,说明如此庞大的预测数据集所能带来的见解,一是研究有机分子晶体的空间群偏好,二是对有关手性分子自发拆分的经验规则进行合理化解释。最后,我们利用这个庞大且多样的数据集开发了用于有机固态的可转移机器学习能量势,训练了一个神经网络晶格能量校正来修正力场能量,这对原本就令人印象深刻的能量排名有显著提升,还训练了一个用于晶体结构重新优化的MACE等变消息传递神经网络。我们得出结论,CSP工作流程的卓越性能和可靠性能够创建在材料设计中具有广泛用途和解释力的非常大的数据集。