Lim Sung-Joo, Shinn-Cunningham Barbara G, Perrachione Tyler K
Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Ave, Boston, MA, 02215, USA.
Biomedical Engineering, Boston University, Boston, MA, USA.
Atten Percept Psychophys. 2019 May;81(4):1167-1177. doi: 10.3758/s13414-019-01684-w.
Speech processing is slower and less accurate when listeners encounter speech from multiple talkers compared to one continuous talker. However, interference from multiple talkers has been investigated only using immediate speech recognition or long-term memory recognition tasks. These tasks reveal opposite effects of speech processing time on speech recognition - while fast processing of multi-talker speech impedes immediate recognition, it also results in more abstract and less talker-specific long-term memories for speech. Here, we investigated whether and how processing multi-talker speech disrupts working memory maintenance, an intermediate stage between perceptual recognition and long-term memory. In a digit sequence recall task, listeners encoded seven-digit sequences and recalled them after a 5-s delay. Sequences were spoken by either a single talker or multiple talkers at one of three presentation rates (0-, 200-, and 500-ms inter-digit intervals). Listeners' recall was slower and less accurate for sequences spoken by multiple talkers than a single talker. Especially for the fastest presentation rate, listeners were less efficient when recalling sequences spoken by multiple talkers. Our results reveal that talker-specificity effects for speech working memory are most prominent when listeners must rapidly encode speech. These results suggest that, like immediate speech recognition, working memory for speech is susceptible to interference from variability across talkers. While many studies ascribe effects of talker variability to the need to calibrate perception to talker-specific acoustics, these results are also consistent with the idea that a sudden change of talkers disrupts attentional focus, interfering with efficient working-memory processing.
与一个持续讲话的人相比,当听众遇到多个讲话者的语音时,语音处理速度会变慢且准确性会降低。然而,对于多个讲话者的干扰,仅使用即时语音识别或长期记忆识别任务进行了研究。这些任务揭示了语音处理时间对语音识别的相反影响——虽然对多讲话者语音的快速处理会妨碍即时识别,但它也会导致对语音的更抽象且特定讲话者的长期记忆减少。在这里,我们研究了处理多讲话者语音是否以及如何破坏工作记忆维持,这是感知识别和长期记忆之间的一个中间阶段。在一个数字序列回忆任务中,听众对七位数序列进行编码,并在延迟5秒后回忆它们。序列由单个讲话者或多个讲话者以三种呈现速率之一说出(数字间间隔为0毫秒、200毫秒和500毫秒)。与单个讲话者说出的序列相比,听众对多个讲话者说出的序列的回忆更慢且准确性更低。特别是对于最快的呈现速率,听众在回忆多个讲话者说出的序列时效率较低。我们的结果表明,当听众必须快速编码语音时,语音工作记忆的特定讲话者效应最为突出。这些结果表明,与即时语音识别一样,语音工作记忆容易受到不同讲话者之间变异性的干扰。虽然许多研究将讲话者变异性的影响归因于需要根据特定讲话者的声学特征校准感知,但这些结果也与这样的观点一致,即讲话者的突然变化会破坏注意力焦点,干扰有效的工作记忆处理。