Department of Psychology, The University of Chicago Chicago, IL, USA.
Front Psychol. 2014 Jul 16;5:698. doi: 10.3389/fpsyg.2014.00698. eCollection 2014.
A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition). So far, this talker variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker's face, speech recognition is improved under adverse listening (e.g., noise or distortion) conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker's face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker's face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred.
说话人变化是语音声学模式的语音解释的上下文变化。不同的说话人在声学模式和语音类别之间有不同的映射,而听众需要适应这些差异。尽管存在这种复杂性,但听众能够在多说话人环境中熟练地理解言语,尽管存在轻微但可测量的性能成本(例如,识别速度较慢)。到目前为止,这种说话人可变性成本仅在仅音频的语音中得到证明。然而,在单说话人环境中的其他研究表明,当听众能够看到说话人的脸时,在增加声学模式和语音类别之间映射不确定性的不利聆听(例如,噪声或失真)条件下,语音识别会得到改善。看到说话人的脸是否会降低多说话人环境中单词识别的成本?我们使用了一个快速单词监测任务,在该任务中,听众对单说话人和多说话人环境中的目标单词识别做出快速判断。结果表明,与多说话人环境相比,仅音频和视听语音的单说话人环境的识别性能更快。然而,在多说话人环境中,视听条件下的识别时间比仅音频条件下慢。这些结果表明,在语音感知期间看到说话人的脸可能会通过增加说话人识别的重要性来减慢识别速度,向听众发出说话人发生变化的信号。