• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

语言建模的社会语言学基础。

The sociolinguistic foundations of language modeling.

作者信息

Grieve Jack, Bartl Sara, Fuoli Matteo, Grafmiller Jason, Huang Weihang, Jawerbaum Alejandro, Murakami Akira, Perlman Marcus, Roemling Dana, Winter Bodo

机构信息

Department of Linguistics and Communication, University of Birmingham, Birmingham, United Kingdom.

出版信息

Front Artif Intell. 2025 Jan 13;7:1472411. doi: 10.3389/frai.2024.1472411. eCollection 2024.

DOI:10.3389/frai.2024.1472411
PMID:39871863
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11770026/
Abstract

In this article, we introduce a sociolinguistic perspective on language modeling. We claim that language models in general are inherently modeling , and we consider how this insight can inform the development and deployment of language models. We begin by presenting a technical definition of the concept of a variety of language as developed in sociolinguistics. We then discuss how this perspective could help us better understand five basic challenges in language modeling: , and . We argue that to maximize the performance and societal value of language models it is important to carefully compile training corpora that accurately represent the specific varieties of language being modeled, drawing on theories, methods, and descriptions from the field of sociolinguistics.

摘要

在本文中,我们引入了一种关于语言建模的社会语言学视角。我们认为一般而言语言模型本质上就是在进行建模,并且我们思考这一见解如何能为语言模型的开发与部署提供指导。我们首先给出社会语言学中所发展出的语言变体概念的技术定义。然后我们讨论这种视角如何能帮助我们更好地理解语言建模中的五个基本挑战: ,以及 。我们认为,为了使语言模型的性能和社会价值最大化,利用社会语言学领域的理论、方法和描述,精心编纂准确代表所建模的特定语言变体的训练语料库非常重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4982/11770026/3732d588c3a2/frai-07-1472411-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4982/11770026/8fd207d51019/frai-07-1472411-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4982/11770026/292661cc6d62/frai-07-1472411-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4982/11770026/a62b74f53916/frai-07-1472411-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4982/11770026/3732d588c3a2/frai-07-1472411-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4982/11770026/8fd207d51019/frai-07-1472411-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4982/11770026/292661cc6d62/frai-07-1472411-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4982/11770026/a62b74f53916/frai-07-1472411-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4982/11770026/3732d588c3a2/frai-07-1472411-g0004.jpg

相似文献

1
The sociolinguistic foundations of language modeling.语言建模的社会语言学基础。
Front Artif Intell. 2025 Jan 13;7:1472411. doi: 10.3389/frai.2024.1472411. eCollection 2024.
2
Personae in sociolinguistic variation.社会语言学变异中的角色。
Wiley Interdiscip Rev Cogn Sci. 2020 Nov;11(6):e1543. doi: 10.1002/wcs.1543. Epub 2020 Sep 10.
3
Editors' Introduction and Review: Sociolinguistic Variation and Cognitive Science.编者引言与综述:社会语言学变异与认知科学。
Top Cogn Sci. 2018 Oct;10(4):679-695. doi: 10.1111/tops.12384. Epub 2018 Oct 7.
4
Large Language Models: A Historical and Sociocultural Perspective.大型语言模型:历史与社会文化视角。
Cogn Sci. 2024 Mar;48(3):e13430. doi: 10.1111/cogs.13430.
5
The semantics, sociolinguistics, and origins of double modals in American English: New insights from social media.美国英语中双重情态动词的语义、社会语言学和起源:社交媒体的新见解。
PLoS One. 2024 Jan 24;19(1):e0295799. doi: 10.1371/journal.pone.0295799. eCollection 2024.
6
Sociolectal and Dialectal Variation in Prosody.韵律中的社会方言和地域方言变化。
Lang Speech. 2022 Dec;65(4):783-790. doi: 10.1177/00238309221122105. Epub 2022 Sep 20.
7
Co-Occurrence, Extension, and Social Salience: The Emergence of Indexicality in an Artificial Language.共现、扩展和社会显著性:人工语言中指称现象的出现。
Cogn Sci. 2023 May;47(5):e13290. doi: 10.1111/cogs.13290.
8
Advances in Completely Automated Vowel Analysis for Sociophonetics: Using End-to-End Speech Recognition Systems With DARLA.社会语音学中全自动化元音分析的进展:使用带有DARLA的端到端语音识别系统
Front Artif Intell. 2021 Sep 24;4:662097. doi: 10.3389/frai.2021.662097. eCollection 2021.
9
Experience With a Linguistic Variant Affects the Acquisition of Its Sociolinguistic Meaning: An Alien-Language-Learning Experiment.语言变体的使用经验会影响其社会语言意义的习得:一项外语学习实验。
Cogn Sci. 2020 Apr;44(4):e12832. doi: 10.1111/cogs.12832.
10
[Technical foundations of large language models].[大语言模型的技术基础]
Radiologie (Heidelb). 2025 Apr;65(4):227-234. doi: 10.1007/s00117-025-01427-z. Epub 2025 Mar 10.

引用本文的文献

1
Attributing authorship via the perplexity of authorial language models.通过作者语言模型的困惑度来确定作者身份。
PLoS One. 2025 Jul 3;20(7):e0327081. doi: 10.1371/journal.pone.0327081. eCollection 2025.

本文引用的文献

1
Scalable watermarking for identifying large language model outputs.可扩展的水印技术用于识别大型语言模型输出。
Nature. 2024 Oct;634(8035):818-823. doi: 10.1038/s41586-024-08025-4. Epub 2024 Oct 23.
2
AI generates covertly racist decisions about people based on their dialect.人工智能根据人们的方言生成关于他们的隐性种族主义决策。
Nature. 2024 Sep;633(8028):147-154. doi: 10.1038/s41586-024-07856-5. Epub 2024 Aug 28.
3
Explaining neural scaling laws.解释神经缩放定律。
Proc Natl Acad Sci U S A. 2024 Jul 2;121(27):e2311878121. doi: 10.1073/pnas.2311878121. Epub 2024 Jun 24.
4
Strong Prediction: Language Model Surprisal Explains Multiple N400 Effects.强预测:语言模型意外值解释多种N400效应。
Neurobiol Lang (Camb). 2024 Apr 1;5(1):107-135. doi: 10.1162/nol_a_00105. eCollection 2024.
5
The Promise and Perils of Artificial Intelligence in Health Professions Education Practice and Scholarship.人工智能在健康专业教育实践和学术中的承诺与危险。
Acad Med. 2024 May 1;99(5):477-481. doi: 10.1097/ACM.0000000000005636. Epub 2024 Jan 24.
6
Systematic testing of three Language Models reveals low language accuracy, absence of response stability, and a yes-response bias.系统测试三种语言模型发现,它们语言准确性低,缺乏响应稳定性,并且存在肯定回答偏见。
Proc Natl Acad Sci U S A. 2023 Dec 19;120(51):e2309583120. doi: 10.1073/pnas.2309583120. Epub 2023 Dec 13.
7
Large language models in medicine.医学中的大型语言模型。
Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.
8
Ethics of large language models in medicine and medical research.医学及医学研究中大型语言模型的伦理问题。
Lancet Digit Health. 2023 Jun;5(6):e333-e335. doi: 10.1016/S2589-7500(23)00083-3. Epub 2023 Apr 27.
9
A comparison of approaches for imbalanced classification problems in the context of retrieving relevant documents for an analysis.在为分析检索相关文档的背景下,对不平衡分类问题的方法进行比较。
J Comput Soc Sci. 2023;6(1):91-163. doi: 10.1007/s42001-022-00191-7. Epub 2022 Dec 19.
10
Competition-level code generation with AlphaCode.使用 AlphaCode 进行竞赛级别的代码生成。
Science. 2022 Dec 9;378(6624):1092-1097. doi: 10.1126/science.abq1158. Epub 2022 Dec 8.