• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从鸟鸣到人类语音识别:基于非线性动力系统层次结构的贝叶斯推断。

From birdsong to human speech recognition: bayesian inference on a hierarchy of nonlinear dynamical systems.

机构信息

Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany ; Group for Neural Theory, Institute of Cognitive Studies, École Normale Supérieure, Paris, France.

出版信息

PLoS Comput Biol. 2013;9(9):e1003219. doi: 10.1371/journal.pcbi.1003219. Epub 2013 Sep 12.

DOI:10.1371/journal.pcbi.1003219
PMID:24068902
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3772045/
Abstract

Our knowledge about the computational mechanisms underlying human learning and recognition of sound sequences, especially speech, is still very limited. One difficulty in deciphering the exact means by which humans recognize speech is that there are scarce experimental findings at a neuronal, microscopic level. Here, we show that our neuronal-computational understanding of speech learning and recognition may be vastly improved by looking at an animal model, i.e., the songbird, which faces the same challenge as humans: to learn and decode complex auditory input, in an online fashion. Motivated by striking similarities between the human and songbird neural recognition systems at the macroscopic level, we assumed that the human brain uses the same computational principles at a microscopic level and translated a birdsong model into a novel human sound learning and recognition model with an emphasis on speech. We show that the resulting Bayesian model with a hierarchy of nonlinear dynamical systems can learn speech samples such as words rapidly and recognize them robustly, even in adverse conditions. In addition, we show that recognition can be performed even when words are spoken by different speakers and with different accents-an everyday situation in which current state-of-the-art speech recognition models often fail. The model can also be used to qualitatively explain behavioral data on human speech learning and derive predictions for future experiments.

摘要

我们对于人类学习和识别声音序列(尤其是语音)的计算机制的了解仍然非常有限。在破译人类识别语音的确切手段时,存在一个困难,即神经元、微观层面上的实验结果稀缺。在这里,我们通过观察动物模型(即鸣禽)表明,我们对语音学习和识别的神经元计算理解可能会得到极大的提高,因为鸣禽面临着与人类相同的挑战:以在线方式学习和解码复杂的听觉输入。受宏观层面上人类和鸣禽神经识别系统之间惊人相似性的启发,我们假设人类大脑在微观层面上使用相同的计算原则,并将鸟鸣模型转化为一个新的人类声音学习和识别模型,重点是语音。我们表明,具有非线性动力系统层次结构的贝叶斯模型可以快速学习语音样本(如单词)并进行稳健识别,即使在不利条件下也是如此。此外,我们表明,即使单词是由不同的说话者和不同的口音说出的,识别也可以进行——这是当前最先进的语音识别模型经常失败的日常情况。该模型还可以用于定性地解释人类语音学习的行为数据,并为未来的实验得出预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1d6/3772045/b8e275260f05/pcbi.1003219.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1d6/3772045/a0f6ec22662d/pcbi.1003219.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1d6/3772045/3707a85410d3/pcbi.1003219.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1d6/3772045/a3f82df5ba53/pcbi.1003219.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1d6/3772045/7dabd9457415/pcbi.1003219.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1d6/3772045/17fdb02f1032/pcbi.1003219.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1d6/3772045/02fd9e8d59b3/pcbi.1003219.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1d6/3772045/cf30cc3f458f/pcbi.1003219.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1d6/3772045/b8e275260f05/pcbi.1003219.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1d6/3772045/a0f6ec22662d/pcbi.1003219.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1d6/3772045/3707a85410d3/pcbi.1003219.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1d6/3772045/a3f82df5ba53/pcbi.1003219.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1d6/3772045/7dabd9457415/pcbi.1003219.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1d6/3772045/17fdb02f1032/pcbi.1003219.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1d6/3772045/02fd9e8d59b3/pcbi.1003219.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1d6/3772045/cf30cc3f458f/pcbi.1003219.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1d6/3772045/b8e275260f05/pcbi.1003219.g008.jpg

相似文献

1
From birdsong to human speech recognition: bayesian inference on a hierarchy of nonlinear dynamical systems.从鸟鸣到人类语音识别:基于非线性动力系统层次结构的贝叶斯推断。
PLoS Comput Biol. 2013;9(9):e1003219. doi: 10.1371/journal.pcbi.1003219. Epub 2013 Sep 12.
2
A hierarchical neuronal model for generation and online recognition of birdsongs.一种用于鸟鸣的生成和在线识别的分层神经元模型。
PLoS Comput Biol. 2011 Dec;7(12):e1002303. doi: 10.1371/journal.pcbi.1002303. Epub 2011 Dec 15.
3
Sound sequences in birdsong: how much do birds really care?鸟鸣中的声音序列:鸟儿到底有多在意?
Philos Trans R Soc Lond B Biol Sci. 2020 Jan 6;375(1789):20190044. doi: 10.1098/rstb.2019.0044. Epub 2019 Nov 18.
4
Recognizing recurrent neural networks (rRNN): Bayesian inference for recurrent neural networks.认识递归神经网络(rRNN):递归神经网络的贝叶斯推理。
Biol Cybern. 2012 Jul;106(4-5):201-17. doi: 10.1007/s00422-012-0490-x. Epub 2012 May 12.
5
Songbirds can learn flexible contextual control over syllable sequencing.鸣禽可以学习灵活的上下文控制来调整音节序列。
Elife. 2021 Jun 1;10:e61610. doi: 10.7554/eLife.61610.
6
Recognizing sequences of sequences.识别序列的序列。
PLoS Comput Biol. 2009 Aug;5(8):e1000464. doi: 10.1371/journal.pcbi.1000464. Epub 2009 Aug 14.
7
Brains for birds and babies: Neural parallels between birdsong and speech acquisition.鸟类和婴儿的大脑:鸟鸣和言语习得之间的神经相似性。
Neurosci Biobehav Rev. 2017 Oct;81(Pt B):225-237. doi: 10.1016/j.neubiorev.2016.12.035. Epub 2017 Jan 10.
8
Dynamical origin of spectrally rich vocalizations in birdsong.鸟鸣中频谱丰富发声的动态起源。
Phys Rev E Stat Nonlin Soft Matter Phys. 2008 Jul;78(1 Pt 1):011905. doi: 10.1103/PhysRevE.78.011905. Epub 2008 Jul 11.
9
Songs to syntax: the linguistics of birdsong.从歌曲到句法:鸟鸣的语言学
Trends Cogn Sci. 2011 Mar;15(3):113-21. doi: 10.1016/j.tics.2011.01.002.
10
Neuronal Sequence Models for Bayesian Online Inference.用于贝叶斯在线推理的神经元序列模型
Front Artif Intell. 2021 May 21;4:530937. doi: 10.3389/frai.2021.530937. eCollection 2021.

引用本文的文献

1
Fast frequency modulation is encoded according to the listener expectations in the human subcortical auditory pathway.快速频率调制是根据人类皮层下听觉通路中的听众期望进行编码的。
Imaging Neurosci (Camb). 2024 Sep 19;2. doi: 10.1162/imag_a_00292. eCollection 2024.
2
From pixels to planning: scale-free active inference.从像素到规划:无标度主动推理
Front Netw Physiol. 2025 Jun 18;5:1521963. doi: 10.3389/fnetp.2025.1521963. eCollection 2025.
3
Convergent neural signatures of speech prediction error are a biological marker for spoken word recognition.

本文引用的文献

1
Elemental gesture dynamics are encoded by song premotor cortical neurons.元素动作动力学由歌唱前运动皮质神经元编码。
Nature. 2013 Mar 7;495(7439):59-64. doi: 10.1038/nature11967. Epub 2013 Feb 27.
2
Orthogonal acoustic dimensions define auditory field maps in human cortex.正交声学维度定义了人类大脑皮层的听觉域图。
Proc Natl Acad Sci U S A. 2012 Dec 11;109(50):20738-43. doi: 10.1073/pnas.1213381109. Epub 2012 Nov 27.
3
Towards a new neurobiology of language.迈向语言新神经生物学。
语音预测误差的会聚神经特征是口语识别的生物学标记。
Nat Commun. 2024 Nov 18;15(1):9984. doi: 10.1038/s41467-024-53782-5.
4
Federated inference and belief sharing.联邦推理与信念共享。
Neurosci Biobehav Rev. 2024 Jan;156:105500. doi: 10.1016/j.neubiorev.2023.105500. Epub 2023 Dec 5.
5
Rhythmic modulation of prediction errors: A top-down gating role for the beta-range in speech processing.预测误差的节律调制:β频段在言语处理中的自上而下的门控作用。
PLoS Comput Biol. 2023 Nov 7;19(11):e1011595. doi: 10.1371/journal.pcbi.1011595. eCollection 2023 Nov.
6
How the conception of control influences our understanding of actions.控制概念如何影响我们对行动的理解。
Nat Rev Neurosci. 2023 May;24(5):313-329. doi: 10.1038/s41583-023-00691-z. Epub 2023 Mar 30.
7
A deep hierarchy of predictions enables online meaning extraction in a computational model of human speech comprehension.深度预测层级使人类言语理解计算模型能够在线提取意义。
PLoS Biol. 2023 Mar 22;21(3):e3002046. doi: 10.1371/journal.pbio.3002046. eCollection 2023 Mar.
8
Stochastic Chaos and Markov Blankets.随机混沌与马尔可夫毯
Entropy (Basel). 2021 Sep 17;23(9):1220. doi: 10.3390/e23091220.
9
Active Inference and Cooperative Communication: An Ecological Alternative to the Alignment View.主动推理与合作交流:对齐观点的一种生态学替代方案。
Front Psychol. 2021 Aug 12;12:708780. doi: 10.3389/fpsyg.2021.708780. eCollection 2021.
10
COSMO-Onset: A Neurally-Inspired Computational Model of Spoken Word Recognition, Combining Top-Down Prediction and Bottom-Up Detection of Syllabic Onsets.COSMO起始:一种受神经启发的口语单词识别计算模型,结合自上而下的预测和音节起始的自下而上检测。
Front Syst Neurosci. 2021 Aug 4;15:653975. doi: 10.3389/fnsys.2021.653975. eCollection 2021.
J Neurosci. 2012 Oct 10;32(41):14125-31. doi: 10.1523/JNEUROSCI.3244-12.2012.
4
Recognizing recurrent neural networks (rRNN): Bayesian inference for recurrent neural networks.认识递归神经网络(rRNN):递归神经网络的贝叶斯推理。
Biol Cybern. 2012 Jul;106(4-5):201-17. doi: 10.1007/s00422-012-0490-x. Epub 2012 May 12.
5
Selective cortical representation of attended speaker in multi-talker speech perception.选择性皮层对多说话人语音感知中被注意说话人的代表。
Nature. 2012 May 10;485(7397):233-6. doi: 10.1038/nature11020.
6
Temporal predictive codes for spoken words in auditory cortex.听觉皮层中口语单词的时间预测码。
Curr Biol. 2012 Apr 10;22(7):615-21. doi: 10.1016/j.cub.2012.02.015. Epub 2012 Mar 15.
7
Multistability in auditory stream segregation: a predictive coding view.听觉流分离中的多稳定性:预测编码观点。
Philos Trans R Soc Lond B Biol Sci. 2012 Apr 5;367(1591):1001-12. doi: 10.1098/rstb.2011.0359.
8
A dynamical pattern recognition model of γ activity in auditory cortex.听觉皮层 γ 活动的动力模式识别模型。
Neural Netw. 2012 Apr;28(2):1-14. doi: 10.1016/j.neunet.2011.12.007. Epub 2012 Jan 13.
9
Phoneme and word recognition in the auditory ventral stream.语音和单词在听觉腹侧流中的识别。
Proc Natl Acad Sci U S A. 2012 Feb 21;109(8):E505-14. doi: 10.1073/pnas.1113427109. Epub 2012 Feb 1.
10
Reconstructing speech from human auditory cortex.从人类听觉皮层重建语音。
PLoS Biol. 2012 Jan;10(1):e1001251. doi: 10.1371/journal.pbio.1001251. Epub 2012 Jan 31.