IEEE Trans Cybern. 2013 Dec;43(6):2114-21. doi: 10.1109/TCYB.2013.2240450.
The frame rate of the observation sequence in distributed speech recognition applications may be reduced to suit a resource-limited front-end device. In order to use models trained using full-frame-rate data in the recognition of reduced frame-rate (RFR) data, we propose a method for adapting the transition probabilities of hidden Markov models (HMMs) to match the frame rate of the observation. Experiments on the recognition of clean and noisy connected digits are conducted to evaluate the proposed method. Experimental results show that the proposed method can effectively compensate for the frame-rate mismatch between the training and the test data. Using our adapted model to recognize the RFR speech data, one can significantly reduce the computation time and achieve the same level of accuracy as that of a method, which restores the frame rate using data interpolation.
在分布式语音识别应用中,观察序列的帧率可能会降低,以适应资源有限的前端设备。为了在降低帧率(RFR)数据的识别中使用全帧率数据训练的模型,我们提出了一种适应隐马尔可夫模型(HMM)的转移概率以匹配观察帧率的方法。在干净和嘈杂的连接数字识别的实验中,评估了所提出的方法。实验结果表明,所提出的方法可以有效地补偿训练和测试数据之间的帧率不匹配。使用我们的自适应模型来识别 RFR 语音数据,可以显著减少计算时间,并达到与使用数据插值来恢复帧率的方法相同的准确性。