Suppr超能文献

使用视听刺激与单模态刺激的 EEG 编码模型比较。

A comparison of EEG encoding models using audiovisual stimuli and their unimodal counterparts.

机构信息

Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, Texas, United States of America.

Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, Texas, United States of America.

出版信息

PLoS Comput Biol. 2024 Sep 9;20(9):e1012433. doi: 10.1371/journal.pcbi.1012433. eCollection 2024 Sep.

Abstract

Communication in the real world is inherently multimodal. When having a conversation, typically sighted and hearing people use both auditory and visual cues to understand one another. For example, objects may make sounds as they move in space, or we may use the movement of a person's mouth to better understand what they are saying in a noisy environment. Still, many neuroscience experiments rely on unimodal stimuli to understand encoding of sensory features in the brain. The extent to which visual information may influence encoding of auditory information and vice versa in natural environments is thus unclear. Here, we addressed this question by recording scalp electroencephalography (EEG) in 11 subjects as they listened to and watched movie trailers in audiovisual (AV), visual (V) only, and audio (A) only conditions. We then fit linear encoding models that described the relationship between the brain responses and the acoustic, phonetic, and visual information in the stimuli. We also compared whether auditory and visual feature tuning was the same when stimuli were presented in the original AV format versus when visual or auditory information was removed. In these stimuli, visual and auditory information was relatively uncorrelated, and included spoken narration over a scene as well as animated or live-action characters talking with and without their face visible. For this stimulus, we found that auditory feature tuning was similar in the AV and A-only conditions, and similarly, tuning for visual information was similar when stimuli were presented with the audio present (AV) and when the audio was removed (V only). In a cross prediction analysis, we investigated whether models trained on AV data predicted responses to A or V only test data similarly to models trained on unimodal data. Overall, prediction performance using AV training and V test sets was similar to using V training and V test sets, suggesting that the auditory information has a relatively smaller effect on EEG. In contrast, prediction performance using AV training and A only test set was slightly worse than using matching A only training and A only test sets. This suggests the visual information has a stronger influence on EEG, though this makes no qualitative difference in the derived feature tuning. In effect, our results show that researchers may benefit from the richness of multimodal datasets, which can then be used to answer more than one research question.

摘要

现实世界中的交流本质上是多模态的。当人们进行对话时,通常有视力和听力的人会同时使用听觉和视觉线索来相互理解。例如,物体在空间中移动时可能会发出声音,或者我们可以利用人的嘴部运动来在嘈杂的环境中更好地理解他们在说什么。尽管如此,许多神经科学实验仍然依赖于单模态刺激来理解大脑中感觉特征的编码。因此,在自然环境中,视觉信息在多大程度上可能影响听觉信息的编码,反之亦然,目前尚不清楚。在这里,我们通过在 11 名被试者听电影预告片并观看电影预告片的视听 (AV)、仅视觉 (V) 和仅听觉 (A) 条件下记录头皮脑电图 (EEG),来解决这个问题。然后,我们拟合了线性编码模型,这些模型描述了大脑反应与刺激中的声学、语音和视觉信息之间的关系。我们还比较了当刺激以原始 AV 格式呈现与当视觉或听觉信息被移除时,听觉和视觉特征调谐是否相同。在这些刺激中,视觉和听觉信息相对不相关,包括场景中的旁白以及动画或真人角色说话,无论其面部是否可见。对于这种刺激,我们发现听觉特征调谐在 AV 和 A 仅条件下相似,同样,当呈现音频时(AV)和移除音频时(V 仅),刺激的视觉信息调谐相似。在交叉预测分析中,我们调查了基于 AV 数据训练的模型是否可以像基于单模态数据训练的模型一样,对 A 或 V 仅测试数据的预测响应进行类似的预测。总体而言,使用 AV 训练和 V 测试集的预测性能与使用 V 训练和 V 测试集的预测性能相似,这表明听觉信息对 EEG 的影响相对较小。相比之下,使用 AV 训练和 A 仅测试集的预测性能略逊于使用匹配的 A 仅训练和 A 仅测试集的预测性能。这表明视觉信息对 EEG 的影响更强,尽管这在得出的特征调谐中没有定性差异。实际上,我们的结果表明,研究人员可能受益于多模态数据集的丰富性,然后可以使用这些数据集来回答多个研究问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4355/11412666/fb427014bf61/pcbi.1012433.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验