使用视听刺激与单模态刺激的 EEG 编码模型比较。

A comparison of EEG encoding models using audiovisual stimuli and their unimodal counterparts.

机构信息

Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, Texas, United States of America.

Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, Texas, United States of America.

出版信息

PLoS Comput Biol. 2024 Sep 9;20(9):e1012433. doi: 10.1371/journal.pcbi.1012433. eCollection 2024 Sep.

DOI:10.1371/journal.pcbi.1012433

PMID:39250485

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11412666/

Abstract

Communication in the real world is inherently multimodal. When having a conversation, typically sighted and hearing people use both auditory and visual cues to understand one another. For example, objects may make sounds as they move in space, or we may use the movement of a person's mouth to better understand what they are saying in a noisy environment. Still, many neuroscience experiments rely on unimodal stimuli to understand encoding of sensory features in the brain. The extent to which visual information may influence encoding of auditory information and vice versa in natural environments is thus unclear. Here, we addressed this question by recording scalp electroencephalography (EEG) in 11 subjects as they listened to and watched movie trailers in audiovisual (AV), visual (V) only, and audio (A) only conditions. We then fit linear encoding models that described the relationship between the brain responses and the acoustic, phonetic, and visual information in the stimuli. We also compared whether auditory and visual feature tuning was the same when stimuli were presented in the original AV format versus when visual or auditory information was removed. In these stimuli, visual and auditory information was relatively uncorrelated, and included spoken narration over a scene as well as animated or live-action characters talking with and without their face visible. For this stimulus, we found that auditory feature tuning was similar in the AV and A-only conditions, and similarly, tuning for visual information was similar when stimuli were presented with the audio present (AV) and when the audio was removed (V only). In a cross prediction analysis, we investigated whether models trained on AV data predicted responses to A or V only test data similarly to models trained on unimodal data. Overall, prediction performance using AV training and V test sets was similar to using V training and V test sets, suggesting that the auditory information has a relatively smaller effect on EEG. In contrast, prediction performance using AV training and A only test set was slightly worse than using matching A only training and A only test sets. This suggests the visual information has a stronger influence on EEG, though this makes no qualitative difference in the derived feature tuning. In effect, our results show that researchers may benefit from the richness of multimodal datasets, which can then be used to answer more than one research question.

摘要

现实世界中的交流本质上是多模态的。当人们进行对话时，通常有视力和听力的人会同时使用听觉和视觉线索来相互理解。例如，物体在空间中移动时可能会发出声音，或者我们可以利用人的嘴部运动来在嘈杂的环境中更好地理解他们在说什么。尽管如此，许多神经科学实验仍然依赖于单模态刺激来理解大脑中感觉特征的编码。因此，在自然环境中，视觉信息在多大程度上可能影响听觉信息的编码，反之亦然，目前尚不清楚。在这里，我们通过在 11 名被试者听电影预告片并观看电影预告片的视听 (AV)、仅视觉 (V) 和仅听觉 (A) 条件下记录头皮脑电图 (EEG)，来解决这个问题。然后，我们拟合了线性编码模型，这些模型描述了大脑反应与刺激中的声学、语音和视觉信息之间的关系。我们还比较了当刺激以原始 AV 格式呈现与当视觉或听觉信息被移除时，听觉和视觉特征调谐是否相同。在这些刺激中，视觉和听觉信息相对不相关，包括场景中的旁白以及动画或真人角色说话，无论其面部是否可见。对于这种刺激，我们发现听觉特征调谐在 AV 和 A 仅条件下相似，同样，当呈现音频时（AV）和移除音频时（V 仅），刺激的视觉信息调谐相似。在交叉预测分析中，我们调查了基于 AV 数据训练的模型是否可以像基于单模态数据训练的模型一样，对 A 或 V 仅测试数据的预测响应进行类似的预测。总体而言，使用 AV 训练和 V 测试集的预测性能与使用 V 训练和 V 测试集的预测性能相似，这表明听觉信息对 EEG 的影响相对较小。相比之下，使用 AV 训练和 A 仅测试集的预测性能略逊于使用匹配的 A 仅训练和 A 仅测试集的预测性能。这表明视觉信息对 EEG 的影响更强，尽管这在得出的特征调谐中没有定性差异。实际上，我们的结果表明，研究人员可能受益于多模态数据集的丰富性，然后可以使用这些数据集来回答多个研究问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4355/11412666/fb427014bf61/pcbi.1012433.g001.jpg

相似文献

A comparison of EEG encoding models using audiovisual stimuli and their unimodal counterparts.使用视听刺激与单模态刺激的 EEG 编码模型比较。

PLoS Comput Biol. 2024 Sep 9;20(9):e1012433. doi: 10.1371/journal.pcbi.1012433. eCollection 2024 Sep.

Generalizable EEG Encoding Models with Naturalistic Audiovisual Stimuli.具有自然视听刺激的可泛化 EEG 编码模型。

J Neurosci. 2021 Oct 27;41(43):8946-8962. doi: 10.1523/JNEUROSCI.2891-20.2021. Epub 2021 Sep 9.

Neural Mechanisms Underlying Cross-Modal Phonetic Encoding.跨模态语音编码的神经机制。

J Neurosci. 2018 Feb 14;38(7):1835-1849. doi: 10.1523/JNEUROSCI.1566-17.2017. Epub 2017 Dec 20.

Neurophysiological Indices of Audiovisual Speech Processing Reveal a Hierarchy of Multisensory Integration Effects.神经生理指标揭示视听言语加工中的多感觉整合效应层次结构。

J Neurosci. 2021 Jun 9;41(23):4991-5003. doi: 10.1523/JNEUROSCI.0906-20.2021. Epub 2021 Apr 6.

A pilot investigation of audiovisual processing and multisensory integration in patients with inherited retinal dystrophies.遗传性视网膜营养不良患者视听处理与多感官整合的初步研究。

BMC Ophthalmol. 2017 Dec 7;17(1):240. doi: 10.1186/s12886-017-0640-y.

Congruent audiovisual speech enhances auditory attention decoding with EEG.视听语音一致增强了 EEG 对听觉注意力的解码。

J Neural Eng. 2019 Nov 6;16(6):066033. doi: 10.1088/1741-2552/ab4340.

Eye Can Hear Clearly Now: Inverse Effectiveness in Natural Audiovisual Speech Processing Relies on Long-Term Crossmodal Temporal Integration.现在眼睛能“听清”了：自然视听言语处理中的反向有效性依赖于长期跨模态时间整合。

J Neurosci. 2016 Sep 21;36(38):9888-95. doi: 10.1523/JNEUROSCI.1396-16.2016.

Effects of spatial congruity on audio-visual multimodal integration.空间一致性对视听多模态整合的影响。

J Cogn Neurosci. 2005 Sep;17(9):1396-409. doi: 10.1162/0898929054985383.

Causal inference regulates audiovisual spatial recalibration via its influence on audiovisual perception.因果推理通过其对视听感知的影响来调节视听空间再校准。

PLoS Comput Biol. 2021 Nov 15;17(11):e1008877. doi: 10.1371/journal.pcbi.1008877. eCollection 2021 Nov.

Superadditive and Subadditive Neural Processing of Dynamic Auditory-Visual Objects in the Presence of Congruent Odors.在同气味存在的情况下，动态视听对象的超相加和次相加神经处理。

Chem Senses. 2017 Dec 25;43(1):35-44. doi: 10.1093/chemse/bjx068.

引用本文的文献

EEG of the Dancing Brain: Decoding Sensory, Motor, and Social Processes during Dyadic Dance.舞动大脑的脑电图：解码双人舞蹈中的感觉、运动和社交过程。

J Neurosci. 2025 May 21;45(21):e2372242025. doi: 10.1523/JNEUROSCI.2372-24.2025.

本文引用的文献

Neural Speech Tracking Highlights the Importance of Visual Speech in Multi-speaker Situations.神经语音追踪强调了视觉语音在多说话人情况下的重要性。

J Cogn Neurosci. 2024 Jan 1;36(1):128-142. doi: 10.1162/jocn_a_02059.

A representation of abstract linguistic categories in the visual system underlies successful lipreading.视觉系统中抽象语言类别的表示是成功唇读的基础。

Neuroimage. 2023 Nov 15;282:120391. doi: 10.1016/j.neuroimage.2023.120391. Epub 2023 Sep 25.

Dataset size considerations for robust acoustic and phonetic speech encoding models in EEG.脑电图中用于稳健声学和语音编码模型的数据集规模考量

Front Hum Neurosci. 2023 Jan 20;16:1001171. doi: 10.3389/fnhum.2022.1001171. eCollection 2022.

Neural dynamics of phoneme sequences reveal position-invariant code for content and order.音素序列的神经动力学揭示了内容和顺序的位置不变代码。

Nat Commun. 2022 Nov 3;13(1):6606. doi: 10.1038/s41467-022-34326-1.

MEG Activity in Visual and Auditory Cortices Represents Acoustic Speech-Related Information during Silent Lip Reading.大脑磁图活动在安静默读时代表了视觉和听觉皮层中的与声学语音相关的信息。

eNeuro. 2022 Jun 27;9(3). doi: 10.1523/ENEURO.0209-22.2022. Print 2022 May-Jun.

Open multimodal iEEG-fMRI dataset from naturalistic stimulation with a short audiovisual film.开放自然刺激下多模态 iEEG-fMRI 数据集，包含一部短的视听影片。

Sci Data. 2022 Mar 21;9(1):91. doi: 10.1038/s41597-022-01173-0.

Masking of the mouth area impairs reconstruction of acoustic speech features and higher-level segmentational features in the presence of a distractor speaker.在存在干扰说话者的情况下，遮挡嘴部区域会损害声学语音特征和高级分段特征的重建。

Neuroimage. 2022 May 15;252:119044. doi: 10.1016/j.neuroimage.2022.119044. Epub 2022 Feb 28.

Neural Markers of Speech Comprehension: Measuring EEG Tracking of Linguistic Speech Representations, Controlling the Speech Acoustics.言语理解的神经标记物：测量 EEG 追踪语言言语表征，控制言语声学。

J Neurosci. 2021 Dec 15;41(50):10316-10329. doi: 10.1523/JNEUROSCI.0812-21.2021. Epub 2021 Nov 3.

Generalizable EEG Encoding Models with Naturalistic Audiovisual Stimuli.具有自然视听刺激的可泛化 EEG 编码模型。

J Neurosci. 2021 Oct 27;41(43):8946-8962. doi: 10.1523/JNEUROSCI.2891-20.2021. Epub 2021 Sep 9.

Naturalistic Stimuli: A Paradigm for Multi-Scale Functional Characterization of the Human Brain.自然主义刺激：一种用于人类大脑多尺度功能表征的范式。

Curr Opin Biomed Eng. 2021 Sep;19. doi: 10.1016/j.cobme.2021.100298. Epub 2021 Jun 2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用视听刺激与单模态刺激的 EEG 编码模型比较。

A comparison of EEG encoding models using audiovisual stimuli and their unimodal counterparts.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献