Department of Biomedical Engineering, Department of Neuroscience, Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA.
Department of Psychology, University of Michigan, Ann Arbor, MI, USA.
Neuroimage. 2023 Nov 15;282:120391. doi: 10.1016/j.neuroimage.2023.120391. Epub 2023 Sep 25.
There is considerable debate over how visual speech is processed in the absence of sound and whether neural activity supporting lipreading occurs in visual brain areas. Much of the ambiguity stems from a lack of behavioral grounding and neurophysiological analyses that cannot disentangle high-level linguistic and phonetic/energetic contributions from visual speech. To address this, we recorded EEG from human observers as they watched silent videos, half of which were novel and half of which were previously rehearsed with the accompanying audio. We modeled how the EEG responses to novel and rehearsed silent speech reflected the processing of low-level visual features (motion, lip movements) and a higher-level categorical representation of linguistic units, known as visemes. The ability of these visemes to account for the EEG - beyond the motion and lip movements - was significantly enhanced for rehearsed videos in a way that correlated with participants' trial-by-trial ability to lipread that speech. Source localization of viseme processing showed clear contributions from visual cortex, with no strong evidence for the involvement of auditory areas. We interpret this as support for the idea that the visual system produces its own specialized representation of speech that is (1) well-described by categorical linguistic features, (2) dissociable from lip movements, and (3) predictive of lipreading ability. We also suggest a reinterpretation of previous findings of auditory cortical activation during silent speech that is consistent with hierarchical accounts of visual and audiovisual speech perception.
关于在没有声音的情况下视觉语音是如何被处理的,以及支持唇读的神经活动是否发生在视觉大脑区域,存在相当大的争议。这种模糊性很大程度上源于缺乏行为基础和神经生理学分析,这些分析无法将视觉语音的高级语言和语音/能量贡献与低级视觉特征区分开来。为了解决这个问题,我们在人类观察者观看无声视频时记录了他们的 EEG,其中一半是新的,一半是之前伴随着音频进行排练的。我们构建了模型,以了解 EEG 对新的和排练的无声语音的反应如何反映低水平视觉特征(运动、嘴唇运动)和语言单位的更高层次的类别表示,即视位。这些视位除了运动和嘴唇运动之外,对排练视频的解释能力显著增强,这种增强与参与者逐次进行唇读的能力相关。视位处理的源定位显示出视觉皮层的明显贡献,没有强有力的证据表明听觉区域的参与。我们将这解释为支持以下观点的证据,即视觉系统产生自己的专门的语音表示,(1)可以用类别化的语言特征很好地描述,(2)与嘴唇运动分离,(3)可以预测唇读能力。我们还建议重新解释先前在无声语音期间听觉皮层激活的发现,这与视觉和视听语音感知的分层解释一致。