Center for Mind and Brain, University of California, Davis, California 95618, and
Center for Mind and Brain, University of California, Davis, California 95618, and.
J Neurosci. 2018 Feb 14;38(7):1835-1849. doi: 10.1523/JNEUROSCI.1566-17.2017. Epub 2017 Dec 20.
Audiovisual (AV) integration is essential for speech comprehension, especially in adverse listening situations. Divergent, but not mutually exclusive, theories have been proposed to explain the neural mechanisms underlying AV integration. One theory advocates that this process occurs via interactions between the auditory and visual cortices, as opposed to fusion of AV percepts in a multisensory integrator. Building upon this idea, we proposed that AV integration in spoken language reflects visually induced weighting of phonetic representations at the auditory cortex. EEG was recorded while male and female human subjects watched and listened to videos of a speaker uttering consonant vowel (CV) syllables /ba/ and /fa/, presented in Auditory-only, AV congruent or incongruent contexts. Subjects reported whether they heard /ba/ or /fa/. We hypothesized that vision alters phonetic encoding by dynamically weighting which phonetic representation in the auditory cortex is strengthened or weakened. That is, when subjects are presented with visual /fa/ and acoustic /ba/ and hear /fa/ (), the visual input strengthens the weighting of the phone /f/ representation. When subjects are presented with visual /ba/ and acoustic /fa/ and hear /ba/ (), the visual input weakens the weighting of the phone /f/ representation. Indeed, we found an enlarged N1 auditory evoked potential when subjects perceived , and a reduced N1 when they perceived , mirroring the N1 behavior for /ba/ and /fa/ in Auditory-only settings. These effects were especially pronounced in individuals with more robust illusory perception. These findings provide evidence that visual speech modifies phonetic encoding at the auditory cortex. The current study presents evidence that audiovisual integration in spoken language occurs when one modality (vision) acts on representations of a second modality (audition). Using the McGurk illusion, we show that visual context primes phonetic representations at the auditory cortex, altering the auditory percept, evidenced by changes in the N1 auditory evoked potential. This finding reinforces the theory that audiovisual integration occurs via visual networks influencing phonetic representations in the auditory cortex. We believe that this will lead to the generation of new hypotheses regarding cross-modal mapping, particularly whether it occurs via direct or indirect routes (e.g., via a multisensory mediator).
视听(AV)整合对于言语理解至关重要,尤其是在不利的听力环境中。虽然存在一些不同的理论,但它们都试图解释视听整合的神经机制。其中一个理论认为,这个过程是通过听觉和视觉皮层之间的相互作用来实现的,而不是通过在多感觉整合器中融合视听感知来实现的。在此基础上,我们提出,口语中的 AV 整合反映了视觉对听觉皮层中语音表征的加权作用。当男性和女性被试观看并聆听说话者说出辅音-元音(CV)音节 /ba/ 和 /fa/ 的视频时,我们记录了他们的脑电图(EEG)。这些视频分别在听觉呈现、视听一致和不一致的情况下呈现。被试报告他们听到的是 /ba/ 还是 /fa/。我们假设,视觉通过动态地加权听觉皮层中哪个语音表征得到加强或减弱来改变语音编码。也就是说,当被试看到视觉上的 /fa/ 和听觉上的 /ba/ 并听到 /fa/()时,视觉输入会加强对 /f/ 音的表示的权重。当被试看到视觉上的 /ba/ 和听觉上的 /fa/ 并听到 /ba/()时,视觉输入会削弱对 /f/ 音的表示的权重。事实上,当被试感知到 /fa/ 时,我们发现听觉诱发 N1 电位增大,而当被试感知到 /ba/ 时,N1 电位减小,这与听觉环境中 /ba/ 和 /fa/ 的 N1 行为相匹配。在具有更强幻觉感知的个体中,这些效应更为明显。这些发现提供了证据表明,视觉语音会改变听觉皮层中的语音编码。本研究提供的证据表明,当一种模态(视觉)作用于另一种模态(听觉)的表示时,口语中的视听整合就会发生。通过使用麦格克错觉,我们表明,视觉上下文可以在听觉皮层中激活语音表征,从而改变听觉感知,这一点可以通过 N1 听觉诱发电位的变化来证明。这一发现支持了视听整合是通过视觉网络影响听觉皮层中的语音表征的理论。我们相信,这将导致关于跨模态映射的新假设的产生,特别是它是否通过直接或间接途径(例如,通过多感觉中介)发生。