Antunes Luis M, Butler Keith T, Grau-Crespo Ricardo
Department of Chemistry, University of Reading, Whiteknights, Reading, UK.
Department of Chemistry, University College London, London, UK.
Nat Commun. 2024 Dec 6;15(1):10570. doi: 10.1038/s41467-024-54639-7.
The generation of plausible crystal structures is often the first step in predicting the structure and properties of a material from its chemical composition. However, most current methods for crystal structure prediction are computationally expensive, slowing the pace of innovation. Seeding structure prediction algorithms with quality generated candidates can overcome a major bottleneck. Here, we introduce CrystaLLM, a methodology for the versatile generation of crystal structures, based on the autoregressive large language modeling (LLM) of the Crystallographic Information File (CIF) format. Trained on millions of CIF files, CrystaLLM focuses on modeling crystal structures through text. CrystaLLM can produce plausible crystal structures for a wide range of inorganic compounds unseen in training, as demonstrated by ab initio simulations. Our approach challenges conventional representations of crystals, and demonstrates the potential of LLMs for learning effective models of crystal chemistry, which will lead to accelerated discovery and innovation in materials science.
从化学成分预测材料的结构和性质时,生成合理的晶体结构通常是第一步。然而,当前大多数晶体结构预测方法计算成本高昂,减缓了创新步伐。用高质量生成的候选结构为结构预测算法提供种子可以克服一个主要瓶颈。在此,我们介绍CrystaLLM,这是一种基于晶体学信息文件(CIF)格式的自回归大语言建模(LLM)来通用生成晶体结构的方法。在数百万个CIF文件上进行训练后,CrystaLLM专注于通过文本对晶体结构进行建模。如从头算模拟所示,CrystaLLM可以为训练中未见过的多种无机化合物生成合理的晶体结构。我们的方法挑战了晶体的传统表示方式,并展示了大语言模型在学习有效的晶体化学模型方面的潜力,这将加速材料科学中的发现和创新。