Choi Ja Young, Hu Elly R, Perrachione Tyler K
Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Ave., Boston, MA, 02215, USA.
Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, USA.
Atten Percept Psychophys. 2018 Apr;80(3):784-797. doi: 10.3758/s13414-017-1395-5.
The nondeterministic relationship between speech acoustics and abstract phonemic representations imposes a challenge for listeners to maintain perceptual constancy despite the highly variable acoustic realization of speech. Talker normalization facilitates speech processing by reducing the degrees of freedom for mapping between encountered speech and phonemic representations. While this process has been proposed to facilitate the perception of ambiguous speech sounds, it is currently unknown whether talker normalization is affected by the degree of potential ambiguity in acoustic-phonemic mapping. We explored the effects of talker normalization on speech processing in a series of speeded classification paradigms, parametrically manipulating the potential for inconsistent acoustic-phonemic relationships across talkers for both consonants and vowels. Listeners identified words with varying potential acoustic-phonemic ambiguity across talkers (e.g., beet/boat vs. boot/boat) spoken by single or mixed talkers. Auditory categorization of words was always slower when listening to mixed talkers compared to a single talker, even when there was no potential acoustic ambiguity between target sounds. Moreover, the processing cost imposed by mixed talkers was greatest when words had the most potential acoustic-phonemic overlap across talkers. Models of acoustic dissimilarity between target speech sounds did not account for the pattern of results. These results suggest (a) that talker normalization incurs the greatest processing cost when disambiguating highly confusable sounds and (b) that talker normalization appears to be an obligatory component of speech perception, taking place even when the acoustic-phonemic relationships across sounds are unambiguous.
语音声学与抽象音位表征之间的非确定性关系给听众带来了挑战,尽管语音的声学实现高度可变,但听众仍需保持感知恒常性。说话者归一化通过减少所遇到的语音与音位表征之间映射的自由度来促进语音处理。虽然有人提出这个过程有助于对模糊语音的感知,但目前尚不清楚说话者归一化是否受声学-音位映射中潜在模糊程度的影响。我们在一系列快速分类范式中探究了说话者归一化对语音处理的影响,通过参数化操作辅音和元音在不同说话者之间声学-音位关系不一致的可能性。听众识别由单个或混合说话者说出的、不同说话者之间具有不同潜在声学-音位模糊性的单词(例如,beet/boat与boot/boat)。与单个说话者相比,听混合说话者的声音时,单词的听觉分类总是更慢,即使目标声音之间没有潜在的声学模糊性。此外,当单词在不同说话者之间具有最大的潜在声学-音位重叠时,混合说话者带来的处理成本最高。目标语音声音之间声学差异的模型无法解释结果模式。这些结果表明:(a)在消除高度易混淆的声音的歧义时,说话者归一化会带来最大的处理成本;(b)说话者归一化似乎是语音感知的一个必要组成部分,即使声音之间的声学-音位关系明确时也会发生。