入围名单B：连续语音识别的贝叶斯模型。

Shortlist B: a Bayesian model of continuous speech recognition.

作者信息

Norris Dennis, McQueen James M

机构信息

Medical Research Council, Cognition and Brain Sciences Unit, Cambridge, UK.

出版信息

Psychol Rev. 2008 Apr;115(2):357-95. doi: 10.1037/0033-295X.115.2.357.

DOI:10.1037/0033-295X.115.2.357

PMID:18426294

Abstract

A Bayesian model of continuous speech recognition is presented. It is based on Shortlist (D. Norris, 1994; D. Norris, J. M. McQueen, A. Cutler, & S. Butterfield, 1997) and shares many of its key assumptions: parallel competitive evaluation of multiple lexical hypotheses, phonologically abstract prelexical and lexical representations, a feedforward architecture with no online feedback, and a lexical segmentation algorithm based on the viability of chunks of the input as possible words. Shortlist B is radically different from its predecessor in two respects. First, whereas Shortlist was a connectionist model based on interactive-activation principles, Shortlist B is based on Bayesian principles. Second, the input to Shortlist B is no longer a sequence of discrete phonemes; it is a sequence of multiple phoneme probabilities over 3 time slices per segment, derived from the performance of listeners in a large-scale gating study. Simulations are presented showing that the model can account for key findings: data on the segmentation of continuous speech, word frequency effects, the effects of mispronunciations on word recognition, and evidence on lexical involvement in phonemic decision making. The success of Shortlist B suggests that listeners make optimal Bayesian decisions during spoken-word recognition.

摘要

本文提出了一种连续语音识别的贝叶斯模型。它基于Shortlist（D.诺里斯，1994；D.诺里斯、J.M.麦奎因、A.卡特勒和S.巴特菲尔德，1997），并共享其许多关键假设：对多个词汇假设进行并行竞争评估、语音学抽象的词前和词汇表征、无在线反馈的前馈架构，以及基于输入片段作为可能单词的可行性的词汇分割算法。Shortlist B在两个方面与其前身有根本不同。首先，Shortlist是一个基于交互激活原则的联结主义模型，而Shortlist B基于贝叶斯原则。其次，Shortlist B的输入不再是离散音素序列；它是每个片段在3个时间片上的多个音素概率序列，该序列源自大规模门控研究中听众的表现。文中给出的模拟结果表明，该模型能够解释关键发现：连续语音分割数据、词频效应、发音错误对单词识别的影响，以及词汇参与音素决策的证据。Shortlist B的成功表明，听众在口语单词识别过程中做出了最优的贝叶斯决策。

相似文献

Shortlist B: a Bayesian model of continuous speech recognition.

Psychol Rev. 2008 Apr;115(2):357-95. doi: 10.1037/0033-295X.115.2.357.

Predictive Neural Computations Support Spoken Word Recognition: Evidence from MEG and Competitor Priming.

J Neurosci. 2021 Aug 11;41(32):6919-6932. doi: 10.1523/JNEUROSCI.1685-20.2021. Epub 2021 Jul 1.

Merging information in speech recognition: feedback is never necessary.

Behav Brain Sci. 2000 Jun;23(3):299-325; discussion 325-70. doi: 10.1017/s0140525x00003241.

Competition and segmentation in spoken-word recognition.

J Exp Psychol Learn Mem Cogn. 1995 Sep;21(5):1209-28. doi: 10.1037//0278-7393.21.5.1209.

Probability and surprisal in auditory comprehension of morphologically complex words.

Cognition. 2012 Oct;125(1):80-106. doi: 10.1016/j.cognition.2012.06.003. Epub 2012 Jul 27.

: Resolving Lexical Ambiguity with Sub-phonemic Information.

Lang Speech. 2020 Sep;63(3):526-549. doi: 10.1177/0023830919866870. Epub 2019 Aug 6.

Possible words and fixed stress in the segmentation of Slovak speech.

Q J Exp Psychol (Hove). 2010 Mar;63(3):555-79. doi: 10.1080/17470210903038958.

The Recognition of Whispered Speech in Real-Time.

Ear Hear. 2022 Mar/Apr;43(2):554-562. doi: 10.1097/AUD.0000000000001114.

Why might there be lexical-prelexical feedback in speech recognition?

Cognition. 2025 Feb;255:106025. doi: 10.1016/j.cognition.2024.106025. Epub 2024 Nov 30.

Generalization in perceptual learning for speech.

Psychon Bull Rev. 2006 Apr;13(2):262-8. doi: 10.3758/bf03193841.

引用本文的文献

Reduced Neural Distinctiveness of Speech Representations in the Middle-Aged Brain.

Neurobiol Lang (Camb). 2025 Jun 18;6. doi: 10.1162/nol_a_00169. eCollection 2025.

Recurrent neural networks as neuro-computational models of human speech recognition.

PLoS Comput Biol. 2025 Jul 28;21(7):e1013244. doi: 10.1371/journal.pcbi.1013244. eCollection 2025 Jul.

The UCI Phonotactic Calculator: An online tool for computing phonotactic metrics.

Behav Res Methods. 2025 Jul 3;57(8):218. doi: 10.3758/s13428-025-02725-z.

Decoupling speech processing from time.

Trends Cogn Sci. 2025 Jun 25. doi: 10.1016/j.tics.2025.05.017.

Fundamental dimensions of real-time word recognition in challenging listening conditions exhibit within-subject stability and link to outcomes.

J Exp Psychol Gen. 2025 Jun 23. doi: 10.1037/xge0001788.

Exploring the dynamics of Shannon's information and iconicity in language processing and lexeme evolution.

PLoS One. 2025 Apr 29;20(4):e0321294. doi: 10.1371/journal.pone.0321294. eCollection 2025.

How Purposeful Adaptive Responses to Adverse Conditions Facilitate Successful Auditory Functioning: A Conceptual Model.

Trends Hear. 2025 Jan-Dec;29:23312165251317010. doi: 10.1177/23312165251317010. Epub 2025 Mar 16.

LDL-AURIS: a computational model, grounded in error-driven learning, for the comprehension of single spoken words.

Lang Cogn Neurosci. 2021 Jul 21;38(4):509-536. doi: 10.1080/23273798.2021.1954207. eCollection 2023.

Neural Bases of Proactive and Predictive Processing of Meaningful Subword Units in Speech Comprehension.

J Neurosci. 2025 Feb 12;45(7):e0781242024. doi: 10.1523/JNEUROSCI.0781-24.2024.

Convergent neural signatures of speech prediction error are a biological marker for spoken word recognition.

Nat Commun. 2024 Nov 18;15(1):9984. doi: 10.1038/s41467-024-53782-5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

入围名单B：连续语音识别的贝叶斯模型。

Shortlist B: a Bayesian model of continuous speech recognition.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献