• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

神经网络在捕捉类似人类的语音识别方面的成功和关键失败。

Successes and critical failures of neural networks in capturing human-like speech recognition.

机构信息

Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Frankfurt, Germany; University of Bristol, School of Psychological Science, Bristol, United Kingdom.

University of Bristol, School of Psychological Science, Bristol, United Kingdom.

出版信息

Neural Netw. 2023 May;162:199-211. doi: 10.1016/j.neunet.2023.02.032. Epub 2023 Feb 24.

DOI:10.1016/j.neunet.2023.02.032
PMID:36913820
Abstract

Natural and artificial audition can in principle acquire different solutions to a given problem. The constraints of the task, however, can nudge the cognitive science and engineering of audition to qualitatively converge, suggesting that a closer mutual examination would potentially enrich artificial hearing systems and process models of the mind and brain. Speech recognition - an area ripe for such exploration - is inherently robust in humans to a number transformations at various spectrotemporal granularities. To what extent are these robustness profiles accounted for by high-performing neural network systems? We bring together experiments in speech recognition under a single synthesis framework to evaluate state-of-the-art neural networks as stimulus-computable, optimized observers. In a series of experiments, we (1) clarify how influential speech manipulations in the literature relate to each other and to natural speech, (2) show the granularities at which machines exhibit out-of-distribution robustness, reproducing classical perceptual phenomena in humans, (3) identify the specific conditions where model predictions of human performance differ, and (4) demonstrate a crucial failure of all artificial systems to perceptually recover where humans do, suggesting alternative directions for theory and model building. These findings encourage a tighter synergy between the cognitive science and engineering of audition.

摘要

自然听觉和人工听觉原则上可以为给定问题提供不同的解决方案。然而,任务的约束条件可能会促使听觉的认知科学和工程学发生质的趋同,这表明更密切的相互检查将有可能丰富人工听觉系统以及心智和大脑的过程模型。语音识别——一个非常适合进行这种探索的领域——在人类中对各种频谱和时间粒度的变换具有内在的鲁棒性。这些鲁棒性特征在多大程度上可以通过高性能神经网络系统来解释?我们将语音识别实验整合到一个单一的综合框架中,以评估最先进的神经网络作为可刺激计算的、经过优化的观测器。在一系列实验中,我们:(1)阐明文献中不同的语音处理方法之间如何相互关联,以及与自然语音的关系;(2)展示机器在哪些粒度上表现出分布外鲁棒性,再现人类的经典感知现象;(3)确定模型对人类表现的预测存在差异的具体条件;(4)证明所有人工系统在人类能够感知到的地方都无法进行感知恢复,这表明需要为理论和模型构建寻找替代方向。这些发现鼓励听觉的认知科学和工程学之间建立更紧密的协同关系。

相似文献

1
Successes and critical failures of neural networks in capturing human-like speech recognition.神经网络在捕捉类似人类的语音识别方面的成功和关键失败。
Neural Netw. 2023 May;162:199-211. doi: 10.1016/j.neunet.2023.02.032. Epub 2023 Feb 24.
2
On the similarities of representations in artificial and brain neural networks for speech recognition.论用于语音识别的人工神经网络与大脑神经网络中表征的相似性。
Front Comput Neurosci. 2022 Dec 21;16:1057439. doi: 10.3389/fncom.2022.1057439. eCollection 2022.
3
Relating dynamic brain states to dynamic machine states: Human and machine solutions to the speech recognition problem.将动态脑状态与动态机器状态相关联:人类和机器对语音识别问题的解决方案。
PLoS Comput Biol. 2017 Sep 25;13(9):e1005617. doi: 10.1371/journal.pcbi.1005617. eCollection 2017 Sep.
4
Artificial Neural Networks Combined with the Principal Component Analysis for Non-Fluent Speech Recognition.人工神经网络与主成分分析在非流畅语音识别中的结合。
Sensors (Basel). 2022 Jan 1;22(1):321. doi: 10.3390/s22010321.
5
Convergence of Artificial Intelligence and Neuroscience towards the Diagnosis of Neurological Disorders-A Scoping Review.人工智能与神经科学在神经紊乱诊断中的交汇:综述
Sensors (Basel). 2023 Mar 13;23(6):3062. doi: 10.3390/s23063062.
6
Deep neural network models of sensory systems: windows onto the role of task constraints.感觉系统的深度神经网络模型:揭示任务约束作用的窗口。
Curr Opin Neurobiol. 2019 Apr;55:121-132. doi: 10.1016/j.conb.2019.02.003. Epub 2019 Mar 15.
7
Modeling the categorical perception of speech sounds: a step toward biological plausibility.模拟语音的范畴知觉:迈向生物合理性的一步。
Cogn Affect Behav Neurosci. 2009 Sep;9(3):304-13. doi: 10.3758/CABN.9.3.304.
8
Brain-like emergent auditory learning: A developmental method.类脑涌现听觉学习:一种发展方法。
Hear Res. 2018 Dec;370:283-293. doi: 10.1016/j.heares.2018.08.010. Epub 2018 Aug 31.
9
Model metamers reveal divergent invariances between biological and artificial neural networks.模型同型揭示了生物神经网络和人工神经网络之间的不同不变性。
Nat Neurosci. 2023 Nov;26(11):2017-2034. doi: 10.1038/s41593-023-01442-0. Epub 2023 Oct 16.
10
Towards reconstructing intelligible speech from the human auditory cortex.从人类听觉皮层重建可理解的语音。
Sci Rep. 2019 Jan 29;9(1):874. doi: 10.1038/s41598-018-37359-z.

引用本文的文献

1
Natural sounds can be reconstructed from human neuroimaging data using deep neural network representation.利用深度神经网络表示,可以从人类神经成像数据中重建自然声音。
PLoS Biol. 2025 Jul 23;23(7):e3003293. doi: 10.1371/journal.pbio.3003293. eCollection 2025 Jul.
2
Enhanced diagnosis of planetary gear train faults based on bispectrum and attention mechanism deep convolutional generative adversarial networks.基于双谱和注意力机制深度卷积生成对抗网络的行星齿轮系故障增强诊断
Sci Rep. 2025 Jul 2;15(1):22501. doi: 10.1038/s41598-025-06623-4.
3
Models optimized for real-world tasks reveal the task-dependent necessity of precise temporal coding in hearing.
针对现实世界任务进行优化的模型揭示了听觉中精确时间编码的任务依赖性必要性。
Nat Commun. 2024 Dec 4;15(1):10590. doi: 10.1038/s41467-024-54700-5.
4
Models optimized for real-world tasks reveal the task-dependent necessity of precise temporal coding in hearing.针对现实世界任务进行优化的模型揭示了听力中精确时间编码在任务依赖方面的必要性。
bioRxiv. 2024 Sep 16:2024.04.21.590435. doi: 10.1101/2024.04.21.590435.
5
Many but not all deep neural network audio models capture brain responses and exhibit correspondence between model stages and brain regions.许多(但不是全部)深度神经网络音频模型可以捕捉大脑反应,并在模型阶段和大脑区域之间表现出对应关系。
PLoS Biol. 2023 Dec 13;21(12):e3002366. doi: 10.1371/journal.pbio.3002366. eCollection 2023 Dec.