Kolozsvári Orsolya B, Xu Weiyong, Leppänen Paavo H T, Hämäläinen Jarmo A
Department of Psychology, University of Jyväskylä, Jyväskylä, Finland.
Jyväskylä Centre for Interdisciplinary Brain Research (CIBR), University of Jyväskylä, Jyväskylä, Finland.
Front Hum Neurosci. 2019 Jul 12;13:243. doi: 10.3389/fnhum.2019.00243. eCollection 2019.
During speech perception, listeners rely on multimodal input and make use of both auditory and visual information. When presented with speech, for example syllables, the differences in brain responses to distinct stimuli are not, however, caused merely by the acoustic or visual features of the stimuli. The congruency of the auditory and visual information and the familiarity of a syllable, that is, whether it appears in the listener's native language or not, also modulates brain responses. We investigated how the congruency and familiarity of the presented stimuli affect brain responses to audio-visual (AV) speech in 12 adult Finnish native speakers and 12 adult Chinese native speakers. They watched videos of a Chinese speaker pronouncing syllables (/pa/, /pha/, /ta/, /tha/, /fa/) during a magnetoencephalography (MEG) measurement where only /pa/ and /ta/ were part of Finnish phonology while all the stimuli were part of Chinese phonology. The stimuli were presented in audio-visual (congruent or incongruent), audio only, or visual only conditions. The brain responses were examined in five time-windows: 75-125, 150-200, 200-300, 300-400, and 400-600 ms. We found significant differences for the congruency comparison in the fourth time-window (300-400 ms) in both sensor and source level analysis. Larger responses were observed for the incongruent stimuli than for the congruent stimuli. For the familiarity comparisons no significant differences were found. The results are in line with earlier studies reporting on the modulation of brain responses for audio-visual congruency around 250-500 ms. This suggests a much stronger process for the general detection of a mismatch between predictions based on lip movements and the auditory signal than for the top-down modulation of brain responses based on phonological information.
在言语感知过程中,听众依赖多模态输入并利用听觉和视觉信息。例如,当呈现语音(如音节)时,大脑对不同刺激的反应差异并非仅仅由刺激的声学或视觉特征引起。听觉和视觉信息的一致性以及音节的熟悉程度,即它是否出现在听众的母语中,也会调节大脑反应。我们研究了呈现刺激的一致性和熟悉程度如何影响12名成年芬兰母语者和12名成年中国母语者对视听(AV)语音的大脑反应。他们在脑磁图(MEG)测量期间观看了一位中国说话者发音节(/pa/、/pha/、/ta/、/tha/、/fa/)的视频,其中只有/pa/和/ta/是芬兰语音系的一部分,而所有刺激都是汉语音系的一部分。刺激以视听(一致或不一致)、仅听觉或仅视觉条件呈现。在五个时间窗口中检查大脑反应:75 - 125、150 - 200、200 - 300、300 - 400和400 - 600毫秒。我们发现在传感器和源水平分析的第四个时间窗口(300 - 400毫秒)中,一致性比较存在显著差异。观察到不一致刺激的反应比一致刺激的反应更大。对于熟悉程度比较,未发现显著差异。结果与早期关于250 - 500毫秒左右视听一致性对大脑反应调节的研究一致。这表明,与基于语音信息对大脑反应的自上而下调节相比,基于唇动预测和听觉信号之间不匹配的一般检测过程要强得多。