Zhao Zimo, Hu Lin, Wang Honghui
Demonstrative Software School, College of Computer Science and Cyber Security, Chengdu University of Technology, Chengdu 610059, China.
College of Physics and Electronics Engineering, Sichuan Normal University, Chengdu 610101, China.
Materials (Basel). 2025 Aug 13;18(16):3793. doi: 10.3390/ma18163793.
This study presents a fine-tuned Large Language Model approach for predicting band gap and stability of transition metal sulfides. Our method processes textual descriptions of crystal structures directly, eliminating the need for complex feature engineering required by traditional ML and GNN approaches. Using a strategically selected dataset of 554 compounds from the Materials Project database, we fine-tuned GPT-3.5-turbo through nine consecutive iterations. Performance metrics improved significantly, with band gap prediction R values increasing from 0.7564 to 0.9989, while stability classification achieved F1 > 0.7751. The fine-tuned model demonstrated superior generalization ability compared to both GPT-3.5 and GPT-4.0 models, maintaining high accuracy across diverse material structures. This approach is particularly valuable for new material systems with limited experimental data, as it can extract meaningful features directly from text descriptions and transfer knowledge from pre-training to domain-specific tasks without relying on extensive numerical datasets.
本研究提出了一种用于预测过渡金属硫化物带隙和稳定性的微调大语言模型方法。我们的方法直接处理晶体结构的文本描述,无需传统机器学习和图神经网络方法所需的复杂特征工程。使用从材料项目数据库中精心挑选的554种化合物数据集,我们通过连续九次迭代对GPT-3.5-turbo进行了微调。性能指标显著提高,带隙预测R值从0.7564提高到0.9989,而稳定性分类的F1>0.7751。与GPT-3.5和GPT-4.0模型相比,微调后的模型表现出卓越的泛化能力,在各种材料结构中均保持高精度。这种方法对于实验数据有限的新材料系统尤为有价值,因为它可以直接从文本描述中提取有意义的特征,并将预训练中的知识转移到特定领域的任务中,而无需依赖大量数值数据集。