Jelinek F
IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA.
Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9964-9. doi: 10.1073/pnas.92.22.9964.
Speech recognition involves three processes: extraction of acoustic indices from the speech signal, estimation of the probability that the observed index string was caused by a hypothesized utterance segment, and determination of the recognized utterance via a search among hypothesized alternatives. This paper is not concerned with the first process. Estimation of the probability of an index string involves a model of index production by any given utterance segment (e.g., a word). Hidden Markov models (HMMs) are used for this purpose [Makhoul, J. & Schwartz, R. (1995) Proc. Natl. Acad. Sci. USA 92, 9956-9963]. Their parameters are state transition probabilities and output probability distributions associated with the transitions. The Baum algorithm that obtains the values of these parameters from speech data via their successive reestimation will be described in this paper. The recognizer wishes to find the most probable utterance that could have caused the observed acoustic index string. That probability is the product of two factors: the probability that the utterance will produce the string and the probability that the speaker will wish to produce the utterance (the language model probability). Even if the vocabulary size is moderate, it is impossible to search for the utterance exhaustively. One practical algorithm is described [Viterbi, A. J. (1967) IEEE Trans. Inf. Theory IT-13, 260-267] that, given the index string, has a high likelihood of finding the most probable utterance.
从语音信号中提取声学指标,估计观察到的指标串由假设的话语片段引起的概率,以及通过在假设的备选方案中进行搜索来确定识别出的话语。本文不涉及第一个过程。指标串概率的估计涉及任何给定话语片段(例如一个单词)的指标生成模型。为此使用了隐马尔可夫模型(HMM)[马赫库尔,J. & 施瓦茨,R.(1995年)《美国国家科学院院刊》92,9956 - 9963]。它们的参数是状态转移概率和与转移相关的输出概率分布。本文将描述通过连续重新估计从语音数据中获取这些参数值的鲍姆算法。识别器希望找到最有可能导致观察到的声学指标串的话语。该概率是两个因素的乘积:话语产生该串的概率和说话者希望产生该话语的概率(语言模型概率)。即使词汇量适中,也不可能详尽地搜索话语。描述了一种实用算法[维特比,A. J.(1967年)《IEEE信息论学报》IT - 13,260 - 267],给定指标串时,该算法很有可能找到最有可能的话语。