Department of Electrical Engineering, Da-Yeh University, Dacun, Changhua, Taiwan.
Faculty of Electronics-Telecommunications, Saigon University, Ho Chi Minh City, Vietnam.
PLoS One. 2018 Nov 7;13(11):e0206916. doi: 10.1371/journal.pone.0206916. eCollection 2018.
In distributed speech recognition applications, the front-end device that stands for any handheld electronic device like smartphones and personal digital assistants (PDAs) captures the speech signal, extracts the speech features, and then sends the speech-feature vector sequence to the back-end server for decoding. Since the front-end mobile device has limited computation capacity, battery power and bandwidth, there exists a feasible strategy of reducing the frame rate of the speech-feature vector sequence to alleviate the drawback. Previously, we proposed a method for adjusting the transition probabilities of the hidden Markov model to enable it to address the degradation of recognition accuracy caused by the frame-rate mismatch between the input and the original model. The previous model adaptation method is referred to as the adapting-then-connecting approach that adapts each model individually and then connects the adapted models to form a word network for speech recognition. We have found that this model adaption approach introduces transitions that skip too many states and increase the number of insertion errors. In this study, we propose an improved model adaptation approach denoted as the connecting-then-adapting approach that first connects the individual models to form a word network and then adapts the connected network for speech recognition. This new approach calculates the transition matrix of a connected model, adapts the transition matrix of the connected model according to the frame rate, and then creates a transition arc for each transition probability. The new approach can better align the speech feature sequence with the states in the word network and therefore reduce the number of insertion errors. We conducted experiments to investigate the effectiveness of our new approach and analyzed the results with respect to insertion, deletion, and substitution errors. The experimental results indicate that the proposed new method obtains a better recognition rate than the old method.
在分布式语音识别应用中,代表任何手持电子设备(如智能手机和个人数字助理(PDA))的前端设备捕获语音信号,提取语音特征,然后将语音特征向量序列发送到后端服务器进行解码。由于前端移动设备的计算能力、电池电量和带宽有限,因此存在一种可行的策略,即降低语音特征向量序列的帧率,以缓解其带来的不利影响。此前,我们提出了一种调整隐马尔可夫模型转移概率的方法,使其能够解决输入和原始模型之间的帧率不匹配导致的识别精度下降的问题。之前的模型自适应方法称为自适应连接方法,它单独对每个模型进行自适应,然后将自适应后的模型连接起来,形成一个单词网络用于语音识别。我们发现,这种模型自适应方法会引入跳过太多状态并增加插入错误数的转换。在本研究中,我们提出了一种改进的模型自适应方法,称为连接后自适应方法,该方法首先将各个模型连接起来形成一个单词网络,然后对连接的网络进行自适应以进行语音识别。这种新方法计算连接模型的转移矩阵,根据帧率自适应连接模型的转移矩阵,然后为每个转移概率创建一个转移弧。新方法可以更好地将语音特征序列与单词网络中的状态对齐,从而减少插入错误数。我们进行了实验来研究新方法的有效性,并对插入、删除和替换错误进行了结果分析。实验结果表明,与旧方法相比,所提出的新方法获得了更好的识别率。