• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

关于社会语言变量自动编码和手动编码的性能考量:来自变量(ING)的经验教训

Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING).

作者信息

Kendall Tyler, Vaughn Charlotte, Farrington Charlie, Gunter Kaylynn, McLean Jaidan, Tacata Chloe, Arnson Shelby

机构信息

Linguistics Department, University of Oregon, Eugene, OR, United States.

Language Science Center, University of Maryland, College Park, MD, United States.

出版信息

Front Artif Intell. 2021 Apr 29;4:648543. doi: 10.3389/frai.2021.648543. eCollection 2021.

DOI:10.3389/frai.2021.648543
PMID:33997775
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8117961/
Abstract

Impressionistic coding of sociolinguistic variables like English (ING), the alternation between pronunciations like and , has been a central part of the analytic workflow in studies of language variation and change for over a half-century. Techniques for automating the measurement and coding for a wide range of sociolinguistic data have been on the rise over recent decades but procedures for coding some features, especially those without clearly defined acoustic correlates like (ING), have lagged behind others, such as vowels and sibilants. This paper explores computational methods for automatically coding variable (ING) in speech recordings, examining the use of automatic speech recognition procedures related to forced alignment (using the Montreal Forced Aligner) as well as supervised machine learning algorithms (linear and radial support vector machines, and random forests). Considering the automated coding of pronunciation variables like (ING) raises broader questions for sociolinguistic methods, such as how much different human analysts agree in their impressionistic codes for such variables and what data might act as the "gold standard" for training and testing of automated procedures. This paper explores several of these considerations in automated, and manual, coding of sociolinguistic variables and provides baseline performance data for automated and manual coding methods. We consider multiple ways of assessing algorithms' performance, including agreement with human coders, as well as the impact on the outcome of an analysis of (ING) that includes linguistic and social factors. Our results show promise for automated coding methods but also highlight that variability in results should be expected even with careful human coded data. All data for our study come from the public Corpus of Regional African American Language and code and derivative datasets (including our hand-coded data) are available with the paper.

摘要

对社会语言变量(如英语中的(ING),即 和 发音之间的交替)进行印象式编码,在半个多世纪以来一直是语言变异与变化研究分析流程的核心部分。近几十年来,用于自动测量和编码各种社会语言数据的技术不断涌现,但对某些特征(尤其是那些没有明确声学关联的特征,如(ING))的编码程序却落后于其他特征,如元音和咝音。本文探讨了在语音记录中自动编码变量(ING)的计算方法,研究了与强制对齐相关的自动语音识别程序(使用蒙特利尔强制对齐器)以及监督机器学习算法(线性和径向支持向量机,以及随机森林)的使用情况。考虑到像(ING)这样的发音变量的自动编码,会引发关于社会语言方法的更广泛问题,比如不同的人工分析员对这类变量的印象式编码的一致程度如何,以及哪些数据可作为自动程序训练和测试的 “黄金标准”。本文探讨了在社会语言变量的自动和手动编码中涉及的其中一些考量因素,并提供了自动和手动编码方法的基线性能数据。我们考虑了评估算法性能的多种方法,包括与人工编码员的一致性,以及对包含语言和社会因素的(ING)分析结果的影响。我们的结果显示了自动编码方法的前景,但也突出表明,即使是经过仔细人工编码的数据结果也会存在变异性。我们研究的所有数据均来自公开的《非裔美国地区语言语料库》,本文还提供了代码及衍生数据集(包括我们的人工编码数据)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ffd/8117961/61c6222f6253/frai-04-648543-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ffd/8117961/61c6222f6253/frai-04-648543-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ffd/8117961/61c6222f6253/frai-04-648543-g0001.jpg

相似文献

1
Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING).关于社会语言变量自动编码和手动编码的性能考量:来自变量(ING)的经验教训
Front Artif Intell. 2021 Apr 29;4:648543. doi: 10.3389/frai.2021.648543. eCollection 2021.
2
Advances in Completely Automated Vowel Analysis for Sociophonetics: Using End-to-End Speech Recognition Systems With DARLA.社会语音学中全自动化元音分析的进展:使用带有DARLA的端到端语音识别系统
Front Artif Intell. 2021 Sep 24;4:662097. doi: 10.3389/frai.2021.662097. eCollection 2021.
3
Performance of Forced-Alignment Algorithms on Children's Speech.强制对齐算法在儿童语音上的性能
J Speech Lang Hear Res. 2021 Jun 18;64(6S):2213-2222. doi: 10.1044/2020_JSLHR-20-00268. Epub 2021 Mar 11.
4
Social Reminiscence in Older Adults' Everyday Conversations: Automated Detection Using Natural Language Processing and Machine Learning.老年人日常对话中的社会怀旧:使用自然语言处理和机器学习的自动检测。
J Med Internet Res. 2020 Sep 15;22(9):e19133. doi: 10.2196/19133.
5
Analyzing dialect variation in historical speech corpora.分析历史语音语料库中的方言变异。
J Acoust Soc Am. 2017 Jul;142(1):406. doi: 10.1121/1.4991009.
6
Sources of Microtemporal Clustering in Sociolinguistic Sequences.社会语言序列中的微观时间聚类来源。
Front Artif Intell. 2019 Jun 20;2:10. doi: 10.3389/frai.2019.00010. eCollection 2019.
7
Sociolinguistic awareness and false belief in young Cantonese learners of English.年轻粤语英语学习者的社会语言学意识和错误信念。
J Exp Child Psychol. 2010 Oct;107(2):188-94. doi: 10.1016/j.jecp.2010.05.001. Epub 2010 Jun 11.
8
Performance analysis of manual and automated systemized nomenclature of medicine (SNOMED) coding.医学术语系统命名法(SNOMED)手动编码与自动编码系统的性能分析
Am J Clin Pathol. 1994 Mar;101(3):253-6. doi: 10.1093/ajcp/101.3.253.
9
Automated Outcome Classification of Computed Tomography Imaging Reports for Pediatric Traumatic Brain Injury.小儿创伤性脑损伤计算机断层扫描成像报告的自动结果分类
Acad Emerg Med. 2016 Feb;23(2):171-8. doi: 10.1111/acem.12859. Epub 2016 Jan 14.
10
Assessment of the impact of the change from manual to automated coding on mortality statistics in Australia.澳大利亚手工编码向自动编码转变对死亡率统计影响的评估。
Health Inf Manag. 2002;30(3):1-11.

引用本文的文献

1
Lenition in L2 Spanish: The Impact of Study Abroad on Phonological Acquisition.二语西班牙语中的语音弱化:留学对语音习得的影响。
Brain Sci. 2024 Sep 21;14(9):946. doi: 10.3390/brainsci14090946.

本文引用的文献

1
Obtaining phonetic transcriptions: a comparison between expert listeners and a continuous speech recognizer.获取语音转录:专业听众与连续语音识别器之间的比较。
Lang Speech. 2001 Sep;44(Pt 3):377-403. doi: 10.1177/00238309010440030401.
2
The measurement of observer agreement for categorical data.分类数据观察者一致性的测量。
Biometrics. 1977 Mar;33(1):159-74.