Crosse Michael J, Lalor Edmund C
School of Engineering, Trinity College Dublin, Dublin, Ireland;
J Neurophysiol. 2014 Apr;111(7):1400-8. doi: 10.1152/jn.00690.2013. Epub 2014 Jan 8.
Visual speech can greatly enhance a listener's comprehension of auditory speech when they are presented simultaneously. Efforts to determine the neural underpinnings of this phenomenon have been hampered by the limited temporal resolution of hemodynamic imaging and the fact that EEG and magnetoencephalographic data are usually analyzed in response to simple, discrete stimuli. Recent research has shown that neuronal activity in human auditory cortex tracks the envelope of natural speech. Here, we exploit this finding by estimating a linear forward-mapping between the speech envelope and EEG data and show that the latency at which the envelope of natural speech is represented in cortex is shortened by >10 ms when continuous audiovisual speech is presented compared with audio-only speech. In addition, we use a reverse-mapping approach to reconstruct an estimate of the speech stimulus from the EEG data and, by comparing the bimodal estimate with the sum of the unimodal estimates, find no evidence of any nonlinear additive effects in the audiovisual speech condition. These findings point to an underlying mechanism that could account for enhanced comprehension during audiovisual speech. Specifically, we hypothesize that low-level acoustic features that are temporally coherent with the preceding visual stream may be synthesized into a speech object at an earlier latency, which may provide an extended period of low-level processing before extraction of semantic information.
当视觉语音和听觉语音同时呈现时,视觉语音能够极大地增强听者对听觉语音的理解。血流动力学成像的时间分辨率有限,以及脑电图(EEG)和脑磁图(MEG)数据通常是针对简单、离散刺激进行分析,这些因素阻碍了人们确定这一现象神经基础的努力。最近的研究表明,人类听觉皮层中的神经元活动会追踪自然语音的包络。在此,我们利用这一发现,通过估计语音包络与EEG数据之间的线性正向映射,结果显示,与仅听语音相比,当呈现连续的视听语音时,自然语音包络在皮层中呈现的潜伏期缩短了超过10毫秒。此外,我们使用反向映射方法从EEG数据中重建语音刺激的估计值,并通过将双峰估计值与单峰估计值之和进行比较,发现在视听语音条件下没有任何非线性加性效应的证据。这些发现指向了一种潜在机制,该机制可以解释视听语音过程中理解能力增强的现象。具体而言,我们假设与先前视觉流在时间上连贯的低层次声学特征可能会在更早的潜伏期被合成到一个语音对象中,这可能会在提取语义信息之前提供一段更长的低层次处理时间。