Department of Psychology, University of Michigan, Ann Arbor, Michigan, USA.
Department of Biomedical Engineering and Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan, USA.
Eur J Neurosci. 2021 Nov;54(9):7301-7317. doi: 10.1111/ejn.15482. Epub 2021 Oct 22.
Speech perception is a central component of social communication. Although principally an auditory process, accurate speech perception in everyday settings is supported by meaningful information extracted from visual cues. Visual speech modulates activity in cortical areas subserving auditory speech perception including the superior temporal gyrus (STG). However, it is unknown whether visual modulation of auditory processing is a unitary phenomenon or, rather, consists of multiple functionally distinct processes. To explore this question, we examined neural responses to audiovisual speech measured from intracranially implanted electrodes in 21 patients with epilepsy. We found that visual speech modulated auditory processes in the STG in multiple ways, eliciting temporally and spatially distinct patterns of activity that differed across frequency bands. In the theta band, visual speech suppressed the auditory response from before auditory speech onset to after auditory speech onset (-93 to 500 ms) most strongly in the posterior STG. In the beta band, suppression was seen in the anterior STG from -311 to -195 ms before auditory speech onset and in the middle STG from -195 to 235 ms after speech onset. In high gamma, visual speech enhanced the auditory response from -45 to 24 ms only in the posterior STG. We interpret the visual-induced changes prior to speech onset as reflecting crossmodal prediction of speech signals. In contrast, modulations after sound onset may reflect a decrease in sustained feedforward auditory activity. These results are consistent with models that posit multiple distinct mechanisms supporting audiovisual speech perception.
言语感知是社会交流的核心组成部分。尽管言语感知主要是一个听觉过程,但在日常环境中准确地感知言语还需要依靠从视觉线索中提取的有意义信息的支持。视觉言语会调节包括颞上回(STG)在内的听觉言语感知皮质区域的活动。然而,目前尚不清楚视觉对听觉处理的调制是一个单一的现象,还是由多个功能上不同的过程组成。为了探索这个问题,我们使用颅内植入电极,对 21 名癫痫患者进行了研究,检测了他们在听到视听语音时的大脑神经反应。我们发现,视觉言语以多种方式调节了 STG 中的听觉过程,引发了在时间和空间上都不同的、具有不同频带特征的活动模式。在 theta 频段,从听觉语音开始之前到之后(-93 到 500 毫秒),视觉语音对听觉反应的抑制作用最强,发生在后颞上回。在 beta 频段,在前颞上回(听觉语音开始前-311 到-195 毫秒)和中颞上回(听觉语音开始后-195 到 235 毫秒),视觉语音在听觉语音开始之前就会产生抑制作用。在高 gamma 频段,只有在后颞上回(听觉语音开始后-45 到 24 毫秒),视觉语音才会增强听觉反应。我们将语音开始之前的视觉诱导变化解释为对语音信号的跨模态预测。相比之下,声音开始后的调制可能反映了持续的前馈听觉活动的减少。这些结果与支持视听言语感知的多个不同机制的模型是一致的。