说话者变异性中的结构：有多少以及能有多大帮助？

Structure in talker variability: How much is there and how much can it help?

作者信息

Kleinschmidt Dave F

机构信息

Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA.

Department of Brain and Cognitive Sciences, University of Rochester, New York, NY, USA.

出版信息

Lang Cogn Neurosci. 2018;34(1):43-68. doi: 10.1080/23273798.2018.1500698. Epub 2018 Jul 30.

DOI:10.1080/23273798.2018.1500698

PMID:30619905

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6320234/

Abstract

One of the persistent puzzles in understanding human speech perception is how listeners cope with talker variability. One thing that might help listeners is structure in talker variability: rather than varying randomly, talkers of the same gender, dialect, age, etc. tend to produce language in similar ways. Listeners are sensitive to this covariation between linguistic variation and socio-indexical variables. In this paper I present new techniques based on ideal observer models to quantify (1) the amount and type of structure in talker variation ( of a grouping variable), and (2) how useful such structure can be for robust speech recognition in the face of talker variability (the of a grouping variable). I demonstrate these techniques in two phonetic domains-word-initial stop voicing and vowel identity-and show that these domains have different amounts and types of talker variability, consistent with previous, impressionistic findings. An R package (phondisttools) accompanies this paper, and the source and data are available from osf.io/zv6e3.

摘要

理解人类言语感知过程中一个长期存在的谜题是听众如何应对说话者的变异性。可能有助于听众的一点是说话者变异性中的结构：相同性别、方言、年龄等的说话者往往不会随机变化，而是倾向于以相似的方式产生语言。听众对语言变异和社会索引变量之间的这种协变很敏感。在本文中，我提出了基于理想观察者模型的新技术，以量化：（1）说话者变异（分组变量的）结构的数量和类型，以及（2）面对说话者变异性时，这种结构对稳健语音识别有多大用处（分组变量的）。我在两个语音领域——词首塞音清浊和元音识别——中展示了这些技术，并表明这些领域具有不同数量和类型的说话者变异性，这与之前的印象主义研究结果一致。本文附带了一个R包（phondisttools），源代码和数据可从osf.io/zv6e3获取。

相似文献

Structure in talker variability: How much is there and how much can it help?说话者变异性中的结构：有多少以及能有多大帮助？

Lang Cogn Neurosci. 2018;34(1):43-68. doi: 10.1080/23273798.2018.1500698. Epub 2018 Jul 30.

Speech perception in children with cochlear implants: effects of lexical difficulty, talker variability, and word length.人工耳蜗植入儿童的言语感知：词汇难度、说话者变异性和单词长度的影响。

Ann Otol Rhinol Laryngol Suppl. 2000 Dec;185:79-81. doi: 10.1177/0003489400109s1234.

Talker familiarity and the accommodation of talker variability.说话人熟悉度与说话人变异性的顺应。

Atten Percept Psychophys. 2021 May;83(4):1842-1860. doi: 10.3758/s13414-020-02203-y. Epub 2021 Jan 4.

Non-native listeners' recognition of high-variability speech using PRESTO.非母语听众使用PRESTO对高变异性语音的识别。

J Am Acad Audiol. 2014 Oct;25(9):869-92. doi: 10.3766/jaaa.25.9.9.

The effects of indexical and phonetic variation on vowel perception in typically developing 9- to 12-year-old children.典型发展的 9 至 12 岁儿童中，音位和语音变化对元音感知的影响。

J Speech Lang Hear Res. 2014 Apr 1;57(2):389-405. doi: 10.1044/2014_JSLHR-S-12-0248.

Listener sensitivity to individual talker differences in voice-onset-time.听众对语音起始时间中个体说话者差异的敏感度。

J Acoust Soc Am. 2004 Jun;115(6):3171-83. doi: 10.1121/1.1701898.

Individual Talker and Token Covariation in the Production of Multiple Cues to Stop Voicing.在发出多个停止发声线索时个体说话者与标记的协变

Phonetica. 2018;75(1):1-23. doi: 10.1159/000448809. Epub 2017 Jun 9.

Lexical and talker effects on word recognition among native and non-native listeners with normal and impaired hearing.词汇和说话者对听力正常和听力受损的母语和非母语听众单词识别的影响。

J Speech Lang Hear Res. 2002 Jun;45(3):585-97. doi: 10.1044/1092-4388(2002/047).

Perceptual Cue Weighting Is Influenced by the Listener's Gender and Subjective Evaluations of the Speaker: The Case of English Stop Voicing.感知线索加权受听众性别和对说话者的主观评价影响：以英语塞音浊化为例。

Front Psychol. 2022 Apr 20;13:840291. doi: 10.3389/fpsyg.2022.840291. eCollection 2022.

Some consequences of stimulus variability on speech processing by 2-month-old infants.刺激变异性对2个月大婴儿言语加工的一些影响。

Cognition. 1992 Jun;43(3):253-91. doi: 10.1016/0010-0277(92)90014-9.

引用本文的文献

Perceiving speech from a familiar speaker engages the person identity network.感知来自熟悉说话者的语音会激活个人身份网络。

PLoS One. 2025 May 14;20(5):e0322927. doi: 10.1371/journal.pone.0322927. eCollection 2025.

SingleMALD: Investigating practice effects in auditory lexical decision.单通道听觉词汇判定任务中的练习效应研究

Behav Res Methods. 2025 Apr 2;57(5):136. doi: 10.3758/s13428-025-02628-z.

Resolving competing predictions in speech: How qualitatively different cues and cue reliability contribute to phoneme identification.解决言语中的竞争预测：不同质量的线索和线索可靠性如何有助于音位识别。

Atten Percept Psychophys. 2024 Apr;86(3):942-961. doi: 10.3758/s13414-024-02849-y. Epub 2024 Feb 22.

Gender stereotypes and social perception of vocal confidence is mitigated by salience of socio-indexical cues to gender.社会索引性线索对性别的显著性减轻了性别刻板印象和对声音自信的社会认知。

Front Psychol. 2023 Dec 14;14:1125164. doi: 10.3389/fpsyg.2023.1125164. eCollection 2023.

Evaluating normalization accounts against the dense vowel space of Central Swedish.根据瑞典中部密集元音空间评估归一化账户。

Front Psychol. 2023 Jun 21;14:1165742. doi: 10.3389/fpsyg.2023.1165742. eCollection 2023.

Right Posterior Temporal Cortex Supports Integration of Phonetic and Talker Information.右后颞叶皮层支持语音和说话者信息的整合。

Neurobiol Lang (Camb). 2023 Mar 8;4(1):145-177. doi: 10.1162/nol_a_00091. eCollection 2023.

The Role of the Right Hemisphere in Processing Phonetic Variability Between Talkers.右半球在处理说话者之间语音变异性方面的作用。

Neurobiol Lang (Camb). 2021 Feb 1;2(1):138-151. doi: 10.1162/nol_a_00028. eCollection 2021.

Using TMS to evaluate a causal role for right posterior temporal cortex in talker-specific phonetic processing.使用 TMS 评估右后颞叶皮层在说话人特异性语音处理中的因果作用。

Brain Lang. 2023 May;240:105264. doi: 10.1016/j.bandl.2023.105264. Epub 2023 Apr 21.

Multiple sources of acoustic variation affect speech processing efficiency.多种声学变异源影响言语处理效率。

J Acoust Soc Am. 2023 Jan;153(1):209. doi: 10.1121/10.0016611.

Modelling representations in speech normalization of prosodic cues.语音归一化中韵律线索的建模表示。

Sci Rep. 2022 Aug 27;12(1):14635. doi: 10.1038/s41598-022-18838-w.

本文引用的文献

Sociolinguistic Perception as Inference Under Uncertainty.社会语言学感知作为不确定性下的推理。

Top Cogn Sci. 2018 Oct;10(4):818-834. doi: 10.1111/tops.12331. Epub 2018 Mar 15.

Audiovisual perceptual learning with multiple speakers.多说话者的视听感知学习

J Phon. 2016 May;56:66-74. doi: 10.1016/j.wocn.2016.02.003. Epub 2016 Mar 14.

Individual Talker and Token Covariation in the Production of Multiple Cues to Stop Voicing.在发出多个停止发声线索时个体说话者与标记的协变

Phonetica. 2018;75(1):1-23. doi: 10.1159/000448809. Epub 2017 Jun 9.

What the Heck Is Salience? How Predictive Language Processing Contributes to Sociolinguistic Perception.显著性究竟是什么？预测性语言处理如何影响社会语言感知。

Front Psychol. 2016 Aug 3;7:1115. doi: 10.3389/fpsyg.2016.01115. eCollection 2016.

Re-examining selective adaptation: Fatiguing feature detectors, or distributional learning?重新审视选择性适应：是特征检测器疲劳，还是分布学习？

Psychon Bull Rev. 2016 Jun;23(3):678-91. doi: 10.3758/s13423-015-0943-z.

Variability in Vowel Production within and between Days.不同日期内及不同日期间元音发音的变异性。

PLoS One. 2015 Sep 2;10(9):e0136791. doi: 10.1371/journal.pone.0136791. eCollection 2015.

Robust speech perception: recognize the familiar, generalize to the similar, and adapt to the novel.强大的语音感知：识别熟悉的内容，将其推广到相似的内容，并适应新的内容。

Psychol Rev. 2015 Apr;122(2):148-203. doi: 10.1037/a0038695.

The socially weighted encoding of spoken words: a dual-route approach to speech perception.口语词的社会加权编码：一种语音感知的双重途径方法。

Front Psychol. 2014 Jan 9;4:1015. doi: 10.3389/fpsyg.2013.01015. eCollection 2013.

Lexically guided phonetic retuning of foreign-accented speech and its generalization.词汇引导的外国口音语音调整及其泛化。

J Exp Psychol Hum Percept Perform. 2014 Apr;40(2):539-55. doi: 10.1037/a0034409. Epub 2013 Sep 23.

Tuned with a Tune: Talker Normalization via General Auditory Processes.以一种声调校准：通过一般听觉过程实现说话者归一化。

Front Psychol. 2012 Jun 22;3:203. doi: 10.3389/fpsyg.2012.00203. eCollection 2012.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验