Suppr超能文献

语言建模的社会语言学基础。

The sociolinguistic foundations of language modeling.

作者信息

Grieve Jack, Bartl Sara, Fuoli Matteo, Grafmiller Jason, Huang Weihang, Jawerbaum Alejandro, Murakami Akira, Perlman Marcus, Roemling Dana, Winter Bodo

机构信息

Department of Linguistics and Communication, University of Birmingham, Birmingham, United Kingdom.

出版信息

Front Artif Intell. 2025 Jan 13;7:1472411. doi: 10.3389/frai.2024.1472411. eCollection 2024.

Abstract

In this article, we introduce a sociolinguistic perspective on language modeling. We claim that language models in general are inherently modeling , and we consider how this insight can inform the development and deployment of language models. We begin by presenting a technical definition of the concept of a variety of language as developed in sociolinguistics. We then discuss how this perspective could help us better understand five basic challenges in language modeling: , and . We argue that to maximize the performance and societal value of language models it is important to carefully compile training corpora that accurately represent the specific varieties of language being modeled, drawing on theories, methods, and descriptions from the field of sociolinguistics.

摘要

在本文中,我们引入了一种关于语言建模的社会语言学视角。我们认为一般而言语言模型本质上就是在进行建模,并且我们思考这一见解如何能为语言模型的开发与部署提供指导。我们首先给出社会语言学中所发展出的语言变体概念的技术定义。然后我们讨论这种视角如何能帮助我们更好地理解语言建模中的五个基本挑战: ,以及 。我们认为,为了使语言模型的性能和社会价值最大化,利用社会语言学领域的理论、方法和描述,精心编纂准确代表所建模的特定语言变体的训练语料库非常重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4982/11770026/8fd207d51019/frai-07-1472411-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验