Kumar Pankaj, Kabra Saurabh, Cole Jacqueline M
Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE. U.K.
ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot OX11 0QX, U.K.
J Chem Inf Model. 2025 Feb 24;65(4):1873-1888. doi: 10.1021/acs.jcim.4c00857. Epub 2025 Jan 31.
Language models are transforming materials-aware natural-language processing by enabling the extraction of dynamic, context-rich information from unstructured text, thus, moving beyond the limitations of traditional information-extraction methods. Moreover, small language models are on the rise because some of them can perform better than large language models (LLMs) when given domain-specific question-answer tasks, especially about an application area that relies on a highly specialized vernacular, such as materials science. We therefore present a new class of MechBERT language models for understanding mechanical stress and strain in materials. These employ Bidirectional Encoder Representations for transformer (BERT) architectures. We showcase four MechBERT models, all of which were pretrained on a corpus of documents that are textually rich in chemicals and their stress-strain properties and were fine-tuned on question-answering tasks. We evaluated the level of performance of our models on domain-specific as well as general English-language question-answer tasks and also explored the influence of the size and type of BERT architectures on model performance. We find that our MechBERT models outperform BERT-based models of the same size and maintain relevancy better than much larger BERT-based models when tasked with domain-specific question-answering tasks within the stress-strain engineering sector. These small language models also enable much faster processing and require a much smaller fraction of data to pretrain them, affording them greater operational efficiency and energy sustainability than LLMs.
语言模型正在改变材料感知自然语言处理,通过从非结构化文本中提取动态的、上下文丰富的信息,从而突破传统信息提取方法的局限性。此外,小型语言模型正在兴起,因为在给定特定领域的问答任务时,其中一些模型的表现优于大型语言模型(LLMs),特别是在依赖高度专业化术语的应用领域,如材料科学。因此,我们提出了一类新的MechBERT语言模型,用于理解材料中的机械应力和应变。这些模型采用了用于Transformer(BERT)架构的双向编码器表示。我们展示了四个MechBERT模型,所有这些模型都在一个文本丰富的化学物质及其应力 - 应变特性的文档语料库上进行了预训练,并在问答任务上进行了微调。我们评估了我们的模型在特定领域以及通用英语问答任务上的性能水平,并探讨了BERT架构的大小和类型对模型性能的影响。我们发现,在应力 - 应变工程领域的特定领域问答任务中,我们的MechBERT模型优于相同大小的基于BERT的模型,并且比大得多的基于BERT的模型更好地保持相关性。这些小型语言模型还能够实现更快的处理速度,并且在预训练时需要的数据量要少得多,与大型语言模型相比,它们具有更高的运营效率和能源可持续性。