Department of Psychology, University of Michigan, Ann Arbor, MI 48109, USA.
Department of Neurology, University of Michigan, Ann Arbor, MI 48109, USA.
Curr Biol. 2024 Sep 9;34(17):4021-4032.e5. doi: 10.1016/j.cub.2024.07.073. Epub 2024 Aug 16.
Watching a speaker's face improves speech perception accuracy. This benefit is enabled, in part, by implicit lipreading abilities present in the general population. While it is established that lipreading can alter the perception of a heard word, it is unknown how these visual signals are represented in the auditory system or how they interact with auditory speech representations. One influential, but untested, hypothesis is that visual speech modulates the population-coded representations of phonetic and phonemic features in the auditory system. This model is largely supported by data showing that silent lipreading evokes activity in the auditory cortex, but these activations could alternatively reflect general effects of arousal or attention or the encoding of non-linguistic features such as visual timing information. This gap limits our understanding of how vision supports speech perception. To test the hypothesis that the auditory system encodes visual speech information, we acquired functional magnetic resonance imaging (fMRI) data from healthy adults and intracranial recordings from electrodes implanted in patients with epilepsy during auditory and visual speech perception tasks. Across both datasets, linear classifiers successfully decoded the identity of silently lipread words using the spatial pattern of auditory cortex responses. Examining the time course of classification using intracranial recordings, lipread words were classified at earlier time points relative to heard words, suggesting a predictive mechanism for facilitating speech. These results support a model in which the auditory system combines the joint neural distributions evoked by heard and lipread words to generate a more precise estimate of what was said.
观看说话者的面部表情可以提高言语感知的准确性。这种好处部分是由于普通人群中存在的隐性唇读能力。虽然已经确定唇读可以改变听到的单词的感知,但尚不清楚这些视觉信号如何在听觉系统中表示,以及它们如何与听觉言语表示相互作用。一个有影响力但未经测试的假设是,视觉言语调节了听觉系统中语音和音位特征的群体编码表示。这个模型在很大程度上得到了数据的支持,这些数据表明,无声唇读会在听觉皮层中引发活动,但这些激活也可能反映出一般的唤醒或注意力效应,或者非语言特征的编码,如视觉时间信息。这种差距限制了我们对视觉如何支持言语感知的理解。为了检验听觉系统编码视觉言语信息的假设,我们在健康成年人的功能磁共振成像(fMRI)数据和癫痫患者植入电极的颅内记录中,进行了听觉和视觉言语感知任务。在两个数据集上,线性分类器都成功地使用听觉皮层反应的空间模式来解码无声唇读单词的身份。通过颅内记录检查分类的时间过程,唇读单词相对于听到的单词可以更早地进行分类,这表明存在一种促进言语的预测机制。这些结果支持了一种模型,即听觉系统结合了由听到和唇读单词引起的联合神经分布,以生成更准确的说话内容估计。