Wiener Seth, Lee Chao-Yang
Language Acquisition, Processing and Pedagogy Lab, Department of Modern Languages, Carnegie Mellon University, Pittsburgh, PA, United States.
Speech Processing Lab, Communication Sciences and Disorders, Ohio University, Athens, OH, United States.
Front Psychol. 2020 Feb 20;11:214. doi: 10.3389/fpsyg.2020.00214. eCollection 2020.
Spoken word recognition involves a perceptual tradeoff between the reliance on the incoming acoustic signal and knowledge about likely sound categories and their co-occurrences as words. This study examined how adult second language (L2) learners navigate between acoustic-based and knowledge-based spoken word recognition when listening to highly variable, multi-talker truncated speech, and whether this perceptual tradeoff changes as L2 listeners gradually become more proficient in their L2 after multiple months of structured classroom learning. First language (L1) Mandarin Chinese listeners and L1 English-L2 Mandarin adult listeners took part in a gating experiment. The L2 listeners were tested twice - once at the start of their intermediate/advanced L2 language class and again 2 months later. L1 listeners were only tested once. Participants were asked to identify syllable-tone words that varied in syllable token frequency (high/low according to a spoken word corpus) and syllable-conditioned tonal probability (most probable/least probable in speech given the syllable). The stimuli were recorded by 16 different talkers and presented at eight gates ranging from onset-only (gate 1) through onset +40 ms increments (gates 2 through 7) to the full word (gate 8). Mixed-effects regression modeling was used to compare performance to our previous study which used single-talker stimuli (Wiener et al., 2019). The results indicated that multi-talker speech caused both L1 and L2 listeners to rely greater on knowledge-based processing of tone. L1 listeners were able to draw on distributional knowledge of syllable-tone probabilities in early gates and switch to predominantly acoustic-based processing when more of the signal was available. In contrast, L2 listeners, with their limited experience with talker range normalization, were less able to effectively transition from probability-based to acoustic-based processing. Moreover, for the L2 listeners, the reliance on such distributional information for spoken word recognition appeared to be conditioned by the nature of the acoustic signal. Single-talker speech did not result in the same pattern of probability-based tone processing, suggesting that knowledge-based processing of L2 speech may only occur under certain acoustic conditions, such as multi-talker speech.
口语单词识别涉及在依赖传入声学信号与关于可能的语音类别及其作为单词的共现知识之间进行感知权衡。本研究考察了成年第二语言(L2)学习者在收听高度可变的多说话者截断语音时如何在基于声学和基于知识的口语单词识别之间进行权衡,以及经过数月结构化课堂学习后,随着L2听众在其L2中逐渐变得更加熟练,这种感知权衡是否会发生变化。母语为汉语普通话的听众和母语为英语 - L2为普通话的成年听众参与了一项选听实验。L2听众接受了两次测试——一次在中级/高级L2语言课程开始时,另一次在两个月后。母语听众只接受了一次测试。参与者被要求识别在音节词频(根据口语单词语料库分为高/低)和音节条件声调概率(在给定音节的语音中最可能/最不可能)方面有所不同的音节声调词。刺激材料由16个不同的说话者录制,并在从仅起始点(第1门)到起始点 +40毫秒增量(第2门到第7门)到完整单词(第8门)的八个选听点呈现。混合效应回归建模用于将表现与我们之前使用单说话者刺激的研究(Wiener等人,2019年)进行比较。结果表明,多说话者语音导致母语和L2听众都更依赖基于知识的声调处理。母语听众能够在早期选听点利用音节声调概率的分布知识,并在有更多信号可用时转向主要基于声学的处理。相比之下,L2听众由于在说话者范围归一化方面经验有限,不太能够有效地从基于概率的处理过渡到基于声学的处理。此外,对于L2听众来说,对这种分布信息用于口语单词识别的依赖似乎受到声学信号性质的制约。单说话者语音并未导致相同的基于概率的声调处理模式,这表明L2语音的基于知识的处理可能仅在某些声学条件下发生,例如多说话者语音。