Department of Neurosurgery, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York 11549.
Feinstein Institutes for Medical Research, Manhasset, New York 11030.
J Neurosci. 2020 Oct 28;40(44):8530-8542. doi: 10.1523/JNEUROSCI.0555-20.2020. Epub 2020 Oct 6.
Natural conversation is multisensory: when we can see the speaker's face, visual speech cues improve our comprehension. The neuronal mechanisms underlying this phenomenon remain unclear. The two main alternatives are visually mediated phase modulation of neuronal oscillations (excitability fluctuations) in auditory neurons and visual input-evoked responses in auditory neurons. Investigating this question using naturalistic audiovisual speech with intracranial recordings in humans of both sexes, we find evidence for both mechanisms. Remarkably, auditory cortical neurons track the temporal dynamics of purely visual speech using the phase of their slow oscillations and phase-related modulations in broadband high-frequency activity. Consistent with known perceptual enhancement effects, the visual phase reset amplifies the cortical representation of concomitant auditory speech. In contrast to this, and in line with earlier reports, visual input reduces the amplitude of evoked responses to concomitant auditory input. We interpret the combination of improved phase tracking and reduced response amplitude as evidence for more efficient and reliable stimulus processing in the presence of congruent auditory and visual speech inputs. Watching the speaker can facilitate our understanding of what is being said. The mechanisms responsible for this influence of visual cues on the processing of speech remain incompletely understood. We studied these mechanisms by recording the electrical activity of the human brain through electrodes implanted surgically inside the brain. We found that visual inputs can operate by directly activating auditory cortical areas, and also indirectly by modulating the strength of cortical responses to auditory input. Our results help to understand the mechanisms by which the brain merges auditory and visual speech into a unitary perception.
当我们能看到说话者的脸时,视觉言语线索会提高我们的理解能力。这种现象背后的神经机制尚不清楚。两种主要的选择是听觉神经元中视觉介导的神经元振荡相位调制(兴奋性波动)和听觉神经元中的视觉输入诱发反应。我们使用男女两性的自然视听言语和颅内记录来研究这个问题,发现这两种机制都有证据。值得注意的是,听觉皮层神经元使用慢振荡的相位及其宽带高频活动中的相位相关调制来跟踪纯视觉言语的时间动态。与已知的感知增强效应一致,视觉相位重置放大了伴随听觉言语的皮质表示。与此相反,与早期报告一致,视觉输入降低了伴随听觉输入的诱发反应的幅度。我们将相位跟踪的改善和反应幅度的降低解释为在存在一致的听觉和视觉言语输入时刺激处理更有效和更可靠的证据。观看说话者可以帮助我们理解正在说的话。视觉线索对言语处理的影响的机制仍不完全清楚。我们通过在大脑内部植入手术电极来记录大脑的电活动来研究这些机制。我们发现,视觉输入可以通过直接激活听觉皮质区域来起作用,也可以通过调节听觉输入对皮质反应的强度来间接起作用。我们的研究结果有助于理解大脑将听觉和视觉言语融合为单一感知的机制。