Donders Institute for Brain, Cognition, and Behaviour, Centre for Cognition, Montessorilaan 3, Radboud University, Nijmegen, HR, The Netherlands.
Max Planck Institute for Psycholinguistics, Nijmegen, XD, The Netherlands.
Hum Brain Mapp. 2021 Mar;42(4):1138-1152. doi: 10.1002/hbm.25282. Epub 2020 Nov 18.
During communication in real-life settings, the brain integrates information from auditory and visual modalities to form a unified percept of our environment. In the current magnetoencephalography (MEG) study, we used rapid invisible frequency tagging (RIFT) to generate steady-state evoked fields and investigated the integration of audiovisual information in a semantic context. We presented participants with videos of an actress uttering action verbs (auditory; tagged at 61 Hz) accompanied by a gesture (visual; tagged at 68 Hz, using a projector with a 1,440 Hz refresh rate). Integration difficulty was manipulated by lower-order auditory factors (clear/degraded speech) and higher-order visual factors (congruent/incongruent gesture). We identified MEG spectral peaks at the individual (61/68 Hz) tagging frequencies. We furthermore observed a peak at the intermodulation frequency of the auditory and visually tagged signals (f - f = 7 Hz), specifically when lower-order integration was easiest because signal quality was optimal. This intermodulation peak is a signature of nonlinear audiovisual integration, and was strongest in left inferior frontal gyrus and left temporal regions; areas known to be involved in speech-gesture integration. The enhanced power at the intermodulation frequency thus reflects the ease of lower-order audiovisual integration and demonstrates that speech-gesture information interacts in higher-order language areas. Furthermore, we provide a proof-of-principle of the use of RIFT to study the integration of audiovisual stimuli, in relation to, for instance, semantic context.
在现实生活环境中的交流中,大脑整合来自听觉和视觉模态的信息,以形成对我们环境的统一感知。在当前的脑磁图 (MEG) 研究中,我们使用快速不可见频率标记 (RIFT) 产生稳态诱发场,并在语义背景下研究视听信息的整合。我们向参与者展示了女演员说出动作动词的视频(听觉;标记为 61Hz),同时伴随着一个手势(视觉;标记为 68Hz,使用刷新率为 1440Hz 的投影仪)。通过较低阶听觉因素(清晰/退化的语音)和较高阶视觉因素(一致/不一致的手势)来操纵整合难度。我们在个体(61/68Hz)标记频率处识别出 MEG 光谱峰值。我们还观察到听觉和视觉标记信号的互调频率(f - f = 7 Hz)处的峰值,特别是当较低阶整合最容易时,因为信号质量最佳。这个互调峰值是非线性视听整合的特征,并且在左额下回和左颞区最强;这些区域已知参与言语 - 手势整合。互调频率处的增强功率反映了较低阶视听整合的容易程度,并表明言语 - 手势信息在更高阶的语言区域中相互作用。此外,我们提供了使用 RIFT 研究视听刺激整合的原理证明,例如与语义背景相关。