Van Herck Joren, Gil María Victoria, Jablonka Kevin Maik, Abrudan Alex, Anker Andy S, Asgari Mehrdad, Blaiszik Ben, Buffo Antonio, Choudhury Leander, Corminboeuf Clemence, Daglar Hilal, Elahi Amir Mohammad, Foster Ian T, Garcia Susana, Garvin Matthew, Godin Guillaume, Good Lydia L, Gu Jianan, Xiao Hu Noémie, Jin Xin, Junkers Tanja, Keskin Seda, Knowles Tuomas P J, Laplaza Ruben, Lessona Michele, Majumdar Sauradeep, Mashhadimoslem Hossein, McIntosh Ruaraidh D, Moosavi Seyed Mohamad, Mouriño Beatriz, Nerli Francesca, Pevida Covadonga, Poudineh Neda, Rajabi-Kochi Mahyar, Saar Kadi L, Hooriabad Saboor Fahimeh, Sagharichiha Morteza, Schmidt K J, Shi Jiale, Simone Elena, Svatunek Dennis, Taddei Marco, Tetko Igor, Tolnai Domonkos, Vahdatifar Sahar, Whitmer Jonathan, Wieland D C Florian, Willumeit-Römer Regine, Züttel Andreas, Smit Berend
Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, École Polytechnique Fédérale de Lausanne (EPFL) Rue de l'Industrie 17 CH-1951 Sion Switzerland
Instituto de Ciencia y TecnologÍa del Carbono (INCAR), CSIC Francisco Pintado Fe 26 33011 Oviedo Spain.
Chem Sci. 2024 Nov 22;16(2):670-684. doi: 10.1039/d4sc04401k. eCollection 2025 Jan 2.
The current generation of large language models (LLMs) has limited chemical knowledge. Recently, it has been shown that these LLMs can learn and predict chemical properties through fine-tuning. Using natural language to train machine learning models opens doors to a wider chemical audience, as field-specific featurization techniques can be omitted. In this work, we explore the potential and limitations of this approach. We studied the performance of fine-tuning three open-source LLMs (GPT-J-6B, Llama-3.1-8B, and Mistral-7B) for a range of different chemical questions. We benchmark their performances against "traditional" machine learning models and find that, in most cases, the fine-tuning approach is superior for a simple classification problem. Depending on the size of the dataset and the type of questions, we also successfully address more sophisticated problems. The most important conclusions of this work are that, for all datasets considered, their conversion into an LLM fine-tuning training set is straightforward and that fine-tuning with even relatively small datasets leads to predictive models. These results suggest that the systematic use of LLMs to guide experiments and simulations will be a powerful technique in any research study, significantly reducing unnecessary experiments or computations.
当前一代的大语言模型(LLMs)化学知识有限。最近有研究表明,这些大语言模型可以通过微调来学习和预测化学性质。使用自然语言训练机器学习模型为更广泛的化学领域受众打开了大门,因为可以省略特定领域的特征提取技术。在这项工作中,我们探索了这种方法的潜力和局限性。我们研究了对三个开源大语言模型(GPT-J-6B、Llama-3.1-8B和Mistral-7B)进行微调以处理一系列不同化学问题时的性能。我们将它们的性能与“传统”机器学习模型进行基准测试,发现在大多数情况下,微调方法在简单分类问题上更具优势。根据数据集的大小和问题的类型,我们还成功解决了更复杂的问题。这项工作最重要的结论是,对于所有考虑的数据集,将其转换为大语言模型微调训练集很简单,并且即使使用相对较小的数据集进行微调也能得到预测模型。这些结果表明,在任何研究中,系统地使用大语言模型来指导实验和模拟将是一种强大的技术,可显著减少不必要的实验或计算。