Lee Ki-Seung
Department of Electronic Engineering, Konkuk University, 1 Hwayang-dong, Gwangjin-gu, Seoul 143-701, Korea.
IEEE Trans Biomed Eng. 2008 Mar;55(3):930-40. doi: 10.1109/TBME.2008.915658.
It is well known that a strong relationship exists between human voices and the movement of articulatory facial muscles. In this paper, we utilize this knowledge to implement an automatic speech recognition scheme which uses solely surface electromyogram (EMG) signals. The sequence of EMG signals for each word is modelled by a hidden Markov model (HMM) framework. The main objective of the work involves building a model for state observation density when multichannel observation sequences are given. The proposed model reflects the dependencies between each of the EMG signals, which are described by introducing a global control variable. We also develop an efficient model training method, based on a maximum likelihood criterion. In a preliminary study, 60 isolated words were used as recognition variables. EMG signals were acquired from three articulatory facial muscles. The findings indicate that such a system may have the capacity to recognize speech signals with an accuracy of up to 87.07%, which is superior to the independent probabilistic model.
众所周知,人类语音与发音面部肌肉的运动之间存在着密切的关系。在本文中,我们利用这一知识来实现一种仅使用表面肌电图(EMG)信号的自动语音识别方案。每个单词的EMG信号序列由隐马尔可夫模型(HMM)框架进行建模。这项工作的主要目标是在给定多通道观测序列时建立状态观测密度模型。所提出的模型反映了每个EMG信号之间的依赖性,这是通过引入一个全局控制变量来描述的。我们还基于最大似然准则开发了一种有效的模型训练方法。在一项初步研究中,使用60个孤立单词作为识别变量。从三块发音面部肌肉采集EMG信号。研究结果表明,这样的系统可能有能力以高达87.07%的准确率识别语音信号,这优于独立概率模型。