• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于一致视听语音训练的深度神经网络呈现麦格克效应。

A Deep Neural Network Trained on Congruent Audiovisual Speech Reports the McGurk Effect.

作者信息

Ma Haotian, Wang Zhengjia, Zhang Xiang, Magnotti John F, Beauchamp Michael S

机构信息

Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA.

出版信息

bioRxiv. 2025 Aug 24:2025.08.20.671347. doi: 10.1101/2025.08.20.671347.

DOI:10.1101/2025.08.20.671347
PMID:40894527
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12393562/
Abstract

In the McGurk effect, incongruent auditory and visual syllables are perceived as a third, illusory syllable. The prevailing explanation for the effect is that the illusory syllable is a consensus percept intermediate between otherwise incompatible auditory and visual representations. To test this idea, we turned to a deep neural network known as AVHuBERT that transcribes audiovisual speech with high accuracy. Critically, AVHuBERT was trained only with audiovisual speech, without exposure to McGurk stimuli or other incongruent speech. In the current study, when tested with congruent audiovisual "ba", "ga" and "da" syllables recorded from 8 different talkers, AVHuBERT transcribed them with near-perfect accuracy, and showed a human-like pattern of highest accuracy for audiovisual speech, slightly lower accuracy for auditory-only speech, and low accuracy for visual-only speech. When presented with incongruent McGurk syllables (auditory "ba" paired with visual "ga"), AVHuBERT reported the McGurk fusion percept of "da" at a rate of 25%, many-fold greater than the rate for either auditory or visual components of the McGurk stimulus presented on their own. To examine the individual variability that is hallmark of human perception of the McGurk effect, 100 variants of AVHuBERT were constructed. Like human observers, AVHuBERT variants was consistently accurate for congruent syllables but highly variable for McGurk syllables. Similarities between the responses of AVHuBERT and humans to congruent and incongruent audiovisual speech, including the McGurk effect, suggests that DNNs may be a useful tool for interrogating the perceptual and neural mechanisms of human audiovisual speech perception.

摘要

在麦格克效应中,不一致的听觉和视觉音节会被感知为第三个虚幻的音节。对该效应的主流解释是,虚幻音节是在原本不兼容的听觉和视觉表征之间的一种共识感知。为了验证这一观点,我们转向了一个名为AVHuBERT的深度神经网络,它能高精度地转录视听语音。关键的是,AVHuBERT仅用视听语音进行训练,未接触过麦格克刺激或其他不一致的语音。在当前研究中,当用从8个不同说话者录制的一致视听“ba”“ga”和“da”音节进行测试时,AVHuBERT转录的准确率近乎完美,并且呈现出一种类似人类的模式:视听语音准确率最高,纯听觉语音准确率略低,纯视觉语音准确率很低。当呈现不一致的麦格克音节(听觉“ba”与视觉“ga”配对)时,AVHuBERT报告“da”的麦格克融合感知的比率为25%,这比单独呈现的麦格克刺激的听觉或视觉成分的比率高出许多倍。为了研究人类对麦格克效应感知的标志性个体差异,构建了100个AVHuBERT变体。与人类观察者一样,AVHuBERT变体对一致音节始终准确,但对麦格克音节高度可变。AVHuBERT与人类对一致和不一致视听语音(包括麦格克效应)的反应之间的相似性表明,深度神经网络可能是探究人类视听语音感知的感知和神经机制的有用工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8733/12393562/672fdbfc54be/nihpp-2025.08.20.671347v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8733/12393562/672fdbfc54be/nihpp-2025.08.20.671347v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8733/12393562/672fdbfc54be/nihpp-2025.08.20.671347v1-f0001.jpg

相似文献

1
A Deep Neural Network Trained on Congruent Audiovisual Speech Reports the McGurk Effect.基于一致视听语音训练的深度神经网络呈现麦格克效应。
bioRxiv. 2025 Aug 24:2025.08.20.671347. doi: 10.1101/2025.08.20.671347.
2
Evidence for a Causal Dissociation of the McGurk Effect and Congruent Audiovisual Speech Perception via TMS to the Left pSTS.经左颞上沟重复经颅磁刺激对 McGurk 效应和一致视听言语感知的因果分离的证据。
Multisens Res. 2024 Aug 16;37(4-5):341-363. doi: 10.1163/22134808-bja10129.
3
The noisy encoding of disparity model predicts perception of the McGurk effect in native Japanese speakers.视差模型的噪声编码预测了以日语为母语者对麦格克效应的感知。
Front Neurosci. 2024 Jun 26;18:1421713. doi: 10.3389/fnins.2024.1421713. eCollection 2024.
4
The McGurk effect is similar in native Mandarin Chinese and American English speakers.麦格克效应在以普通话为母语的中国人和以美式英语为母语的人中表现相似。
Front Psychol. 2025 Mar 28;16:1531566. doi: 10.3389/fpsyg.2025.1531566. eCollection 2025.
5
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
6
Interventions for childhood apraxia of speech.儿童言语失用症的干预措施。
Cochrane Database Syst Rev. 2018 May 30;5(5):CD006278. doi: 10.1002/14651858.CD006278.pub3.
7
The agreement of phonetic transcriptions between paediatric speech and language therapists transcribing a disordered speech sample.儿科言语和语言治疗师转写语音样本的音标转录的一致性。
Int J Lang Commun Disord. 2024 Sep-Oct;59(5):1981-1995. doi: 10.1111/1460-6984.13043. Epub 2024 Jun 8.
8
Repeatedly experiencing the McGurk effect induces long-lasting changes in auditory speech perception.反复体验麦格克效应会引起听觉言语感知的长期变化。
Commun Psychol. 2024 Apr 3;2(1):25. doi: 10.1038/s44271-024-00073-w.
9
Seeing a Talker's Mouth Reduces the Effort of Perceiving Speech and Repairing Perceptual Mistakes for Listeners With Cochlear Implants.看到说话者的嘴部动作可减轻人工耳蜗佩戴者感知语音和纠正感知错误的难度。
Ear Hear. 2025 Jun 16. doi: 10.1097/AUD.0000000000001683.
10
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施:系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。
Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.

本文引用的文献

1
Variations in unisensory speech perception explain interindividual differences in McGurk illusion susceptibility.单感官言语感知的差异解释了个体在麦格克错觉易感性上的个体间差异。
Psychon Bull Rev. 2025 Apr 24. doi: 10.3758/s13423-025-02697-3.
2
The McGurk effect is similar in native Mandarin Chinese and American English speakers.麦格克效应在以普通话为母语的中国人和以美式英语为母语的人中表现相似。
Front Psychol. 2025 Mar 28;16:1531566. doi: 10.3389/fpsyg.2025.1531566. eCollection 2025.
3
Multisensory integration operates on correlated input from unimodal transient channels.
多感官整合作用于来自单峰瞬态通道的相关输入。
Elife. 2025 Jan 22;12:RP90841. doi: 10.7554/eLife.90841.
4
Models optimized for real-world tasks reveal the task-dependent necessity of precise temporal coding in hearing.针对现实世界任务进行优化的模型揭示了听觉中精确时间编码的任务依赖性必要性。
Nat Commun. 2024 Dec 4;15(1):10590. doi: 10.1038/s41467-024-54700-5.
5
The noisy encoding of disparity model predicts perception of the McGurk effect in native Japanese speakers.视差模型的噪声编码预测了以日语为母语者对麦格克效应的感知。
Front Neurosci. 2024 Jun 26;18:1421713. doi: 10.3389/fnins.2024.1421713. eCollection 2024.
6
Shared functional specialization in transformer-based language models and the human brain.基于变压器的语言模型和人类大脑的功能专业化共享。
Nat Commun. 2024 Jun 29;15(1):5523. doi: 10.1038/s41467-024-49173-5.
7
Synthetic faces generated with the facial action coding system or deep neural networks improve speech-in-noise perception, but not as much as real faces.通过面部动作编码系统或深度神经网络生成的合成面孔可改善噪声环境下的语音感知,但效果不如真实面孔。
Front Neurosci. 2024 May 9;18:1379988. doi: 10.3389/fnins.2024.1379988. eCollection 2024.
8
Artificial Neural Network Language Models Predict Human Brain Responses to Language Even After a Developmentally Realistic Amount of Training.即使经过符合发育实际的训练量,人工神经网络语言模型仍能预测人类大脑对语言的反应。
Neurobiol Lang (Camb). 2024 Apr 1;5(1):43-63. doi: 10.1162/nol_a_00137. eCollection 2024.
9
Dissecting neural computations in the human auditory pathway using deep neural networks for speech.利用用于语音的深度神经网络解析人类听觉通路中的神经计算。
Nat Neurosci. 2023 Dec;26(12):2213-2225. doi: 10.1038/s41593-023-01468-4. Epub 2023 Oct 30.
10
How multisensory neurons solve causal inference.多感觉神经元如何解决因果推断问题。
Proc Natl Acad Sci U S A. 2021 Aug 10;118(32). doi: 10.1073/pnas.2106235118.