Grieve Jack, Bartl Sara, Fuoli Matteo, Grafmiller Jason, Huang Weihang, Jawerbaum Alejandro, Murakami Akira, Perlman Marcus, Roemling Dana, Winter Bodo
Department of Linguistics and Communication, University of Birmingham, Birmingham, United Kingdom.
Front Artif Intell. 2025 Jan 13;7:1472411. doi: 10.3389/frai.2024.1472411. eCollection 2024.
In this article, we introduce a sociolinguistic perspective on language modeling. We claim that language models in general are inherently modeling , and we consider how this insight can inform the development and deployment of language models. We begin by presenting a technical definition of the concept of a variety of language as developed in sociolinguistics. We then discuss how this perspective could help us better understand five basic challenges in language modeling: , and . We argue that to maximize the performance and societal value of language models it is important to carefully compile training corpora that accurately represent the specific varieties of language being modeled, drawing on theories, methods, and descriptions from the field of sociolinguistics.
在本文中,我们引入了一种关于语言建模的社会语言学视角。我们认为一般而言语言模型本质上就是在进行建模,并且我们思考这一见解如何能为语言模型的开发与部署提供指导。我们首先给出社会语言学中所发展出的语言变体概念的技术定义。然后我们讨论这种视角如何能帮助我们更好地理解语言建模中的五个基本挑战: ,以及 。我们认为,为了使语言模型的性能和社会价值最大化,利用社会语言学领域的理论、方法和描述,精心编纂准确代表所建模的特定语言变体的训练语料库非常重要。