改进的模型自适应方法，用于识别降帧率连续语音。

Improved model adaptation approach for recognition of reduced-frame-rate continuous speech.

机构信息

Department of Electrical Engineering, Da-Yeh University, Dacun, Changhua, Taiwan.

Faculty of Electronics-Telecommunications, Saigon University, Ho Chi Minh City, Vietnam.

出版信息

PLoS One. 2018 Nov 7;13(11):e0206916. doi: 10.1371/journal.pone.0206916. eCollection 2018.

DOI:10.1371/journal.pone.0206916

PMID:30403736

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6221327/

Abstract

In distributed speech recognition applications, the front-end device that stands for any handheld electronic device like smartphones and personal digital assistants (PDAs) captures the speech signal, extracts the speech features, and then sends the speech-feature vector sequence to the back-end server for decoding. Since the front-end mobile device has limited computation capacity, battery power and bandwidth, there exists a feasible strategy of reducing the frame rate of the speech-feature vector sequence to alleviate the drawback. Previously, we proposed a method for adjusting the transition probabilities of the hidden Markov model to enable it to address the degradation of recognition accuracy caused by the frame-rate mismatch between the input and the original model. The previous model adaptation method is referred to as the adapting-then-connecting approach that adapts each model individually and then connects the adapted models to form a word network for speech recognition. We have found that this model adaption approach introduces transitions that skip too many states and increase the number of insertion errors. In this study, we propose an improved model adaptation approach denoted as the connecting-then-adapting approach that first connects the individual models to form a word network and then adapts the connected network for speech recognition. This new approach calculates the transition matrix of a connected model, adapts the transition matrix of the connected model according to the frame rate, and then creates a transition arc for each transition probability. The new approach can better align the speech feature sequence with the states in the word network and therefore reduce the number of insertion errors. We conducted experiments to investigate the effectiveness of our new approach and analyzed the results with respect to insertion, deletion, and substitution errors. The experimental results indicate that the proposed new method obtains a better recognition rate than the old method.

摘要

在分布式语音识别应用中，代表任何手持电子设备（如智能手机和个人数字助理（PDA））的前端设备捕获语音信号，提取语音特征，然后将语音特征向量序列发送到后端服务器进行解码。由于前端移动设备的计算能力、电池电量和带宽有限，因此存在一种可行的策略，即降低语音特征向量序列的帧率，以缓解其带来的不利影响。此前，我们提出了一种调整隐马尔可夫模型转移概率的方法，使其能够解决输入和原始模型之间的帧率不匹配导致的识别精度下降的问题。之前的模型自适应方法称为自适应连接方法，它单独对每个模型进行自适应，然后将自适应后的模型连接起来，形成一个单词网络用于语音识别。我们发现，这种模型自适应方法会引入跳过太多状态并增加插入错误数的转换。在本研究中，我们提出了一种改进的模型自适应方法，称为连接后自适应方法，该方法首先将各个模型连接起来形成一个单词网络，然后对连接的网络进行自适应以进行语音识别。这种新方法计算连接模型的转移矩阵，根据帧率自适应连接模型的转移矩阵，然后为每个转移概率创建一个转移弧。新方法可以更好地将语音特征序列与单词网络中的状态对齐，从而减少插入错误数。我们进行了实验来研究新方法的有效性，并对插入、删除和替换错误进行了结果分析。实验结果表明，与旧方法相比，所提出的新方法获得了更好的识别率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cce/6221327/2ff82b35c55d/pone.0206916.g001.jpg

相似文献

Improved model adaptation approach for recognition of reduced-frame-rate continuous speech.

PLoS One. 2018 Nov 7;13(11):e0206916. doi: 10.1371/journal.pone.0206916. eCollection 2018.

Adaptation of hidden Markov models for recognizing speech of reduced frame rate.

IEEE Trans Cybern. 2013 Dec;43(6):2114-21. doi: 10.1109/TCYB.2013.2240450.

Model adaptation method for recognition of speech with missing frames.

J Acoust Soc Am. 2014 Mar;135(3):EL166-71. doi: 10.1121/1.4865264.

Hierarchical singleton-type recurrent neural fuzzy networks for noisy speech recognition.

IEEE Trans Neural Netw. 2007 May;18(3):833-43. doi: 10.1109/TNN.2007.891194.

Experiments with fast Fourier transform, linear predictive and cepstral coefficients in dysarthric speech recognition algorithms using hidden Markov Model.

IEEE Trans Neural Syst Rehabil Eng. 2005 Dec;13(4):558-61. doi: 10.1109/TNSRE.2005.856074.

Noise-robust speech recognition through auditory feature detection and spike sequence decoding.

Neural Comput. 2014 Mar;26(3):523-56. doi: 10.1162/NECO_a_00557. Epub 2013 Dec 9.

Investigation of an HMM/ANN hybrid structure in pattern recognition application using cepstral analysis of dysarthric (distorted) speech signals.

Med Eng Phys. 2006 Oct;28(8):741-8. doi: 10.1016/j.medengphy.2005.11.002. Epub 2005 Dec 15.

Hybrid simulated annealing and its application to optimization of hidden Markov models for visual speech recognition.

IEEE Trans Syst Man Cybern B Cybern. 2010 Aug;40(4):1188-96. doi: 10.1109/TSMCB.2009.2036753. Epub 2010 Jan 8.

Improved phoneme-based myoelectric speech recognition.

IEEE Trans Biomed Eng. 2009 Aug;56(8):2016-23. doi: 10.1109/TBME.2009.2024079. Epub 2009 Jun 16.

EMG-based speech recognition using hidden markov models with global control variables.

IEEE Trans Biomed Eng. 2008 Mar;55(3):930-40. doi: 10.1109/TBME.2008.915658.

引用本文的文献

[Digital technology and children's maxillofacial management].

Lin Chuang Er Bi Yan Hou Tou Jing Wai Ke Za Zhi. 2023 Aug;37(8):662-666. doi: 10.13201/j.issn.2096-7993.2023.08.013.

本文引用的文献

Adaptation of hidden Markov models for recognizing speech of reduced frame rate.

IEEE Trans Cybern. 2013 Dec;43(6):2114-21. doi: 10.1109/TCYB.2013.2240450.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

改进的模型自适应方法，用于识别降帧率连续语音。

Improved model adaptation approach for recognition of reduced-frame-rate continuous speech.

机构信息

Department of Electrical Engineering, Da-Yeh University, Dacun, Changhua, Taiwan.

Faculty of Electronics-Telecommunications, Saigon University, Ho Chi Minh City, Vietnam.

出版信息

PLoS One. 2018 Nov 7;13(11):e0206916. doi: 10.1371/journal.pone.0206916. eCollection 2018.

DOI:10.1371/journal.pone.0206916

PMID:30403736

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6221327/

Abstract

摘要

改进的模型自适应方法，用于识别降帧率连续语音。

Improved model adaptation approach for recognition of reduced-frame-rate continuous speech.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

改进的模型自适应方法，用于识别降帧率连续语音。

Improved model adaptation approach for recognition of reduced-frame-rate continuous speech.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献