Kleinschmidt Dave F, Jaeger T Florian
Department of Brain and Cognitive Sciences, University of Rochester.
Departments of Brain and Cognitive Sciences, Computer Science, and Linguistics, University of Rochester.
Psychol Rev. 2015 Apr;122(2):148-203. doi: 10.1037/a0038695.
Successful speech perception requires that listeners map the acoustic signal to linguistic categories. These mappings are not only probabilistic, but change depending on the situation. For example, one talker's /p/ might be physically indistinguishable from another talker's /b/ (cf. lack of invariance). We characterize the computational problem posed by such a subjectively nonstationary world and propose that the speech perception system overcomes this challenge by (a) recognizing previously encountered situations, (b) generalizing to other situations based on previous similar experience, and (c) adapting to novel situations. We formalize this proposal in the ideal adapter framework: (a) to (c) can be understood as inference under uncertainty about the appropriate generative model for the current talker, thereby facilitating robust speech perception despite the lack of invariance. We focus on 2 critical aspects of the ideal adapter. First, in situations that clearly deviate from previous experience, listeners need to adapt. We develop a distributional (belief-updating) learning model of incremental adaptation. The model provides a good fit against known and novel phonetic adaptation data, including perceptual recalibration and selective adaptation. Second, robust speech recognition requires that listeners learn to represent the structured component of cross-situation variability in the speech signal. We discuss how these 2 aspects of the ideal adapter provide a unifying explanation for adaptation, talker-specificity, and generalization across talkers and groups of talkers (e.g., accents and dialects). The ideal adapter provides a guiding framework for future investigations into speech perception and adaptation, and more broadly language comprehension.
成功的言语感知要求听众将声学信号映射到语言类别。这些映射不仅是概率性的,而且会根据情况而变化。例如,一个说话者的/p/在物理上可能与另一个说话者的/b/无法区分(参见缺乏不变性)。我们描述了由这样一个主观上非平稳的世界所带来的计算问题,并提出言语感知系统通过以下方式克服这一挑战:(a)识别先前遇到的情况,(b)根据先前的类似经验推广到其他情况,以及(c)适应新情况。我们在理想适配器框架中形式化了这一建议:(a)至(c)可以理解为在关于当前说话者的适当生成模型的不确定性下进行推理,从而尽管缺乏不变性仍能促进稳健的言语感知。我们关注理想适配器的两个关键方面。首先,在明显偏离先前经验的情况下,听众需要进行适应。我们开发了一种增量适应的分布(信念更新)学习模型。该模型与已知和新的语音适应数据,包括感知重新校准和选择性适应,拟合良好。其次,稳健的语音识别要求听众学会表征语音信号中跨情况变异性的结构化成分。我们讨论了理想适配器的这两个方面如何为适应、说话者特异性以及跨说话者和说话者群体(例如口音和方言)的泛化提供统一的解释。理想适配器为未来对言语感知和适应以及更广泛的语言理解的研究提供了一个指导框架。