Bronkhorst A W, Bosman A J, Smoorenburg G F
TNO Institute for Perception, Soesterberg, The Netherlands.
J Acoust Soc Am. 1993 Jan;93(1):499-509. doi: 10.1121/1.406844.
A model is presented that quantifies the effect of context on speech recognition. In this model, a speech stimulus is considered as a concatenation of a number of equivalent elements (e.g., phonemes constituting a word). The model employs probabilities that individual elements are recognized and chances that missed elements are guessed using contextual information. Predictions are given of the probability that the entire stimulus, or part of it, is reproduced correctly. The model can be applied to both speech recognition and visual recognition of printed text. It has been verified with data obtained with syllables of the consonant-vowel-consonant (CVC) type presented near the reception threshold in quiet and in noise, with the results of an experiment using orthographic presentation of incomplete CVC syllables and with results of word counts in a CVC lexicon. A remarkable outcome of the analysis is that the cues which occur only in spoken language (e.g., coarticulatory cues) seem to have a much greater influence on recognition performance when the stimuli are presented near the threshold in noise than when they are presented near the absolute threshold. Demonstrations are given of further predictions provided by the model: word recognition as a function of signal-to-noise ratio, closed-set word recognition, recognition of interrupted speech, and sentence recognition.
本文提出了一个对语境对语音识别的影响进行量化的模型。在这个模型中,语音刺激被视为多个等效元素(例如,构成一个单词的音素)的串联。该模型利用单个元素被识别的概率以及利用上下文信息猜测遗漏元素的机会。给出了整个刺激或其部分被正确再现的概率预测。该模型可应用于语音识别和印刷文本的视觉识别。它已通过以下数据得到验证:在安静和有噪声的环境中,接近接收阈值呈现的辅音 - 元音 - 辅音(CVC)类型音节的数据、使用不完整CVC音节的正字法呈现的实验结果以及CVC词汇表中的单词计数结果。分析的一个显著结果是,仅在口语中出现的线索(例如协同发音线索),当刺激在噪声中接近阈值呈现时,似乎比在接近绝对阈值呈现时对识别性能有更大的影响。给出了该模型提供的进一步预测的演示:作为信噪比函数的单词识别、封闭集单词识别、中断语音的识别和句子识别。