不同程度的声学-音素模糊性表明，在语音处理中，说话者归一化是必不可少的。

Varying acoustic-phonemic ambiguity reveals that talker normalization is obligatory in speech processing.

作者信息

Choi Ja Young, Hu Elly R, Perrachione Tyler K

机构信息

Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Ave., Boston, MA, 02215, USA.

Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, USA.

出版信息

Atten Percept Psychophys. 2018 Apr;80(3):784-797. doi: 10.3758/s13414-017-1395-5.

DOI:10.3758/s13414-017-1395-5

PMID:29417449

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5840042/

Abstract

The nondeterministic relationship between speech acoustics and abstract phonemic representations imposes a challenge for listeners to maintain perceptual constancy despite the highly variable acoustic realization of speech. Talker normalization facilitates speech processing by reducing the degrees of freedom for mapping between encountered speech and phonemic representations. While this process has been proposed to facilitate the perception of ambiguous speech sounds, it is currently unknown whether talker normalization is affected by the degree of potential ambiguity in acoustic-phonemic mapping. We explored the effects of talker normalization on speech processing in a series of speeded classification paradigms, parametrically manipulating the potential for inconsistent acoustic-phonemic relationships across talkers for both consonants and vowels. Listeners identified words with varying potential acoustic-phonemic ambiguity across talkers (e.g., beet/boat vs. boot/boat) spoken by single or mixed talkers. Auditory categorization of words was always slower when listening to mixed talkers compared to a single talker, even when there was no potential acoustic ambiguity between target sounds. Moreover, the processing cost imposed by mixed talkers was greatest when words had the most potential acoustic-phonemic overlap across talkers. Models of acoustic dissimilarity between target speech sounds did not account for the pattern of results. These results suggest (a) that talker normalization incurs the greatest processing cost when disambiguating highly confusable sounds and (b) that talker normalization appears to be an obligatory component of speech perception, taking place even when the acoustic-phonemic relationships across sounds are unambiguous.

摘要

语音声学与抽象音位表征之间的非确定性关系给听众带来了挑战，尽管语音的声学实现高度可变，但听众仍需保持感知恒常性。说话者归一化通过减少所遇到的语音与音位表征之间映射的自由度来促进语音处理。虽然有人提出这个过程有助于对模糊语音的感知，但目前尚不清楚说话者归一化是否受声学-音位映射中潜在模糊程度的影响。我们在一系列快速分类范式中探究了说话者归一化对语音处理的影响，通过参数化操作辅音和元音在不同说话者之间声学-音位关系不一致的可能性。听众识别由单个或混合说话者说出的、不同说话者之间具有不同潜在声学-音位模糊性的单词（例如，beet/boat与boot/boat）。与单个说话者相比，听混合说话者的声音时，单词的听觉分类总是更慢，即使目标声音之间没有潜在的声学模糊性。此外，当单词在不同说话者之间具有最大的潜在声学-音位重叠时，混合说话者带来的处理成本最高。目标语音声音之间声学差异的模型无法解释结果模式。这些结果表明：（a）在消除高度易混淆的声音的歧义时，说话者归一化会带来最大的处理成本；（b）说话者归一化似乎是语音感知的一个必要组成部分，即使声音之间的声学-音位关系明确时也会发生。

相似文献

Varying acoustic-phonemic ambiguity reveals that talker normalization is obligatory in speech processing.不同程度的声学-音素模糊性表明，在语音处理中，说话者归一化是必不可少的。

Atten Percept Psychophys. 2018 Apr;80(3):784-797. doi: 10.3758/s13414-017-1395-5.

Talker normalization is mediated by structured indexical information.说话者归一化由结构化索引信息介导。

Atten Percept Psychophys. 2020 Jul;82(5):2237-2243. doi: 10.3758/s13414-020-01971-x.

Distinct mechanisms for talker adaptation operate in parallel on different timescales.不同的说话者适应机制在不同的时间尺度上并行运作。

Psychon Bull Rev. 2022 Apr;29(2):627-634. doi: 10.3758/s13423-021-02019-3. Epub 2021 Nov 3.

Time and information in perceptual adaptation to speech.言语知觉适应中的时间和信息。

Cognition. 2019 Nov;192:103982. doi: 10.1016/j.cognition.2019.05.019. Epub 2019 Jun 21.

Multiple sources of acoustic variation affect speech processing efficiency.多种声学变异源影响言语处理效率。

J Acoust Soc Am. 2023 Jan;153(1):209. doi: 10.1121/10.0016611.

Perceptual learning of multiple talkers: Determinants, characteristics, and limitations.多位说话者的感知学习：决定因素、特征和局限性。

Atten Percept Psychophys. 2022 Oct;84(7):2335-2359. doi: 10.3758/s13414-022-02556-6. Epub 2022 Sep 8.

Acoustic differences, listener expectations, and the perceptual accommodation of talker variability.声学差异、听众期望与说话者变异性的感知适应。

J Exp Psychol Hum Percept Perform. 2007 Apr;33(2):391-409. doi: 10.1037/0096-1523.33.2.391.

Noninvasive neurostimulation of left temporal lobe disrupts rapid talker adaptation in speech processing.非侵入性左颞叶神经刺激会破坏言语处理中快速说话者的适应能力。

Brain Lang. 2019 Sep;196:104655. doi: 10.1016/j.bandl.2019.104655. Epub 2019 Jul 13.

Individual talker differences in voice-onset-time.说话者在语音起始时间上的个体差异。

J Acoust Soc Am. 2003 Jan;113(1):544-52. doi: 10.1121/1.1528172.

Hierarchical contributions of linguistic knowledge to talker identification: Phonological versus lexical familiarity.语言知识对说话者识别的分层贡献：语音与词汇熟悉度

Atten Percept Psychophys. 2019 May;81(4):1088-1107. doi: 10.3758/s13414-019-01778-5.

引用本文的文献

Influence of talker and accent variability on rapid adaptation and generalization to non-native accented speech in younger and older adults.说话者和口音变异性对年轻人和老年人快速适应及泛化到非母语口音语音的影响。

Audit Percept Cogn. 2024;7(2):110-139. doi: 10.1080/25742442.2024.2345568. Epub 2024 Apr 28.

Multiple talker processing in autistic adult listeners.自闭症成人听众的多说话人处理。

Sci Rep. 2024 Jun 26;14(1):14698. doi: 10.1038/s41598-024-62429-w.

How the conception of control influences our understanding of actions.控制概念如何影响我们对行动的理解。

Nat Rev Neurosci. 2023 May;24(5):313-329. doi: 10.1038/s41583-023-00691-z. Epub 2023 Mar 30.

Joint, distributed and hierarchically organized encoding of linguistic features in the human auditory cortex.人类听觉皮层中语言特征的联合、分布式和分层组织编码。

Nat Hum Behav. 2023 May;7(5):740-753. doi: 10.1038/s41562-023-01520-0. Epub 2023 Mar 2.

Multiple sources of acoustic variation affect speech processing efficiency.多种声学变异源影响言语处理效率。

J Acoust Soc Am. 2023 Jan;153(1):209. doi: 10.1121/10.0016611.

Implicit and explicit learning in talker identification.言语识别中的内隐学习和外显学习。

Atten Percept Psychophys. 2022 Aug;84(6):2002-2015. doi: 10.3758/s13414-022-02500-8. Epub 2022 May 9.

Attention, task demands, and multitalker processing costs in speech perception.注意、任务需求以及多说话人处理在言语感知中的代价。

J Exp Psychol Hum Percept Perform. 2021 Dec;47(12):1673-1680. doi: 10.1037/xhp0000963.

Talker discontinuity disrupts attention to speech: Evidence from EEG and pupillometry.说话人不连续性会干扰对言语的注意：来自 EEG 和瞳孔测量的证据。

Brain Lang. 2021 Oct;221:104996. doi: 10.1016/j.bandl.2021.104996. Epub 2021 Aug 3.

Listener expectations and the perceptual accommodation of talker variability: A pre-registered replication.听话者的期望与说话者变异性的感知适应：一项预先注册的复制研究。

Atten Percept Psychophys. 2021 Aug;83(6):2367-2376. doi: 10.3758/s13414-021-02317-x. Epub 2021 May 4.

Perception of local and non-local vowels by adults and children in the South.南方成年人和儿童对本地和非本地元音的感知。

J Acoust Soc Am. 2020 Jan;147(1):627. doi: 10.1121/10.0000542.

本文引用的文献

Dysfunction of Rapid Neural Adaptation in Dyslexia.阅读障碍中快速神经适应功能障碍。

Neuron. 2016 Dec 21;92(6):1383-1397. doi: 10.1016/j.neuron.2016.11.020.

Toward an integrative model of talker normalization.迈向说话者归一化的整合模型。

J Exp Psychol Hum Percept Perform. 2016 Aug;42(8):1252-68. doi: 10.1037/xhp0000216. Epub 2016 Mar 7.

Functionally integrated neural processing of linguistic and talker information: An event-related fMRI and ERP study.语言和说话者信息的功能整合神经处理：一项事件相关功能磁共振成像和事件相关电位研究。

Neuroimage. 2016 Jan 1;124(Pt A):536-549. doi: 10.1016/j.neuroimage.2015.08.064. Epub 2015 Sep 4.

Robust speech perception: recognize the familiar, generalize to the similar, and adapt to the novel.强大的语音感知：识别熟悉的内容，将其推广到相似的内容，并适应新的内容。

Psychol Rev. 2015 Apr;122(2):148-203. doi: 10.1037/a0038695.

Attention modulates specificity effects in spoken word recognition: Challenges to the time-course hypothesis.注意力调节口语单词识别中的特异性效应：对时间进程假说的挑战。

Atten Percept Psychophys. 2015 Jul;77(5):1674-84. doi: 10.3758/s13414-015-0854-0.

The effect of exposure to a single vowel on talker normalization for vowels.单次元音暴露对元音说话人归一化的影响。

J Acoust Soc Am. 2015 Mar;137(3):1443-51. doi: 10.1121/1.4913456.

Phonetic category recalibration: What are the categories?语音类别重新校准：有哪些类别？

J Phon. 2014 Jul 1;45:91-105. doi: 10.1016/j.wocn.2014.04.002.

The socially weighted encoding of spoken words: a dual-route approach to speech perception.口语词的社会加权编码：一种语音感知的双重途径方法。

Front Psychol. 2014 Jan 9;4:1015. doi: 10.3389/fpsyg.2013.01015. eCollection 2013.

Random effects structure for confirmatory hypothesis testing: Keep it maximal.用于验证性假设检验的随机效应结构：保持其最大化。

J Mem Lang. 2013 Apr;68(3). doi: 10.1016/j.jml.2012.11.001.

Specificity of dimension-based statistical learning in word recognition.基于维度的统计学习在单词识别中的特异性

J Exp Psychol Hum Percept Perform. 2014 Jun;40(3):1009-21. doi: 10.1037/a0035269. Epub 2013 Dec 23.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验