鸡尾酒会环境下相关语音的视觉增强

Visual Enhancement of Relevant Speech in a 'Cocktail Party'.

机构信息

1Center for Mind and Brain, University of California, Davis, 95618, USA.

2Department of Cognitive and Information Sciences, University of California, Merced, CA 95343, USA.

出版信息

Multisens Res. 2020 Feb 18;33(3):277-294. doi: 10.1163/22134808-20191423. Print 2020 Feb 28.

DOI:10.1163/22134808-20191423

PMID:32508080

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7308176/

Abstract

Lip-reading improves intelligibility in noisy acoustical environments. We hypothesized that watching mouth movements benefits speech comprehension in a 'cocktail party' by strengthening the encoding of the neural representations of the visually paired speech stream. In an audiovisual (AV) task, EEG was recorded as participants watched and listened to videos of a speaker uttering a sentence while also hearing a concurrent sentence by a speaker of the opposite gender. A key manipulation was that each audio sentence had a 200-ms segment replaced by white noise. To assess comprehension, subjects were tasked with transcribing the AV-attended sentence on randomly selected trials. In the auditory-only trials, subjects listened to the same sentences and completed the same task while watching a static picture of a speaker of either gender. Subjects directed their listening to the voice of the gender of the speaker in the video. We found that the N1 auditory-evoked potential (AEP) time-locked to white noise onsets was significantly more inhibited for the AV-attended sentences than for those of the auditorily-attended (A-attended) and AV-unattended sentences. N1 inhibition to noise onsets has been shown to index restoration of phonemic representations of degraded speech. These results underscore that attention and congruency in the AV setting help streamline the complex auditory scene, partly by reinforcing the neural representations of the visually attended stream, heightening the perception of continuity and comprehension.

摘要

唇读可以提高嘈杂环境下的言语可懂度。我们假设，观看口型运动可以通过增强与视觉上配对的言语流的神经表示的编码，从而使“鸡尾酒会”中的言语理解受益。在视听（AV）任务中，当参与者观看并聆听说话者说出一个句子，同时听到另一个异性说话者的句子时，记录 EEG。一个关键的操作是，每个音频句子都有 200 毫秒的片段被替换为白噪声。为了评估理解能力，受试者被要求在随机选择的试验中转录视听参与的句子。在仅听觉试验中，受试者听了相同的句子，并在观看说话者的静态图片时完成了相同的任务，无论性别如何。受试者将注意力集中在视频中说话者的声音上。我们发现，与听觉参与（A 参与）和视听不参与的句子相比，与白噪声起始时间锁定的听觉诱发电位（AEP）的 N1 抑制对于视听参与的句子更为明显。对噪声起始的 N1 抑制已被证明可以指示受损言语的语音表示的恢复。这些结果强调，在视听环境中，注意力和一致性有助于简化复杂的听觉场景，部分原因是增强了视觉上关注的流的神经表示，从而提高了连续性和理解的感知。