Niu Chuanxin M, Lee Kangwoo, Houde John F, Sanger Terence D
Department of Rehabilitation, School of Medicine, Ruijin Hospital, Shanghai Jiao Tong University Shanghai, China.
Department of Biomedical Engineering, University of Southern California Los Angeles, CA, USA.
Front Hum Neurosci. 2015 Jan 22;8:1077. doi: 10.3389/fnhum.2014.01077. eCollection 2014.
For children with severe cerebral palsy (CP), social and emotional interactions can be significantly limited due to impaired speech motor function. However, if it is possible to extract continuous voluntary control signals from the electromyograph (EMG) of limb muscles, then EMG may be used to drive the synthesis of intelligible speech with controllable speed, intonation and articulation. We report an important first step: the feasibility of controlling a vowel synthesizer using non-speech muscles. A classic formant-based speech synthesizer is adapted to allow the lowest two formants to be controlled by surface EMG from skeletal muscles. EMG signals are filtered using a non-linear Bayesian filtering algorithm that provides the high bandwidth and accuracy required for speech tasks. The frequencies of the first two formants determine points in a 2D plane, and vowels are targets on this plane. We focus on testing the overall feasibility of producing intelligible English vowels with myocontrol using two straightforward EMG-formant mappings. More mappings can be tested in the future to optimize the intelligibility. Vowel generation was tested on 10 healthy adults and 4 patients with dyskinetic CP. Five English vowels were generated by subjects in pseudo-random order, after only 10 min of device familiarization. The fraction of vowels correctly identified by 4 naive listeners exceeded 80% for the vowels generated by healthy adults and 57% for vowels generated by patients with CP. Our goal is a continuous "virtual voice" with personalized intonation and articulation that will restore not only the intellectual content but also the social and emotional content of speech for children and adults with severe movement disorders.
对于患有严重脑瘫(CP)的儿童,由于言语运动功能受损,其社交和情感互动可能会受到显著限制。然而,如果能够从肢体肌肉的肌电图(EMG)中提取连续的自主控制信号,那么EMG可用于驱动合成具有可控语速、语调及发音的可理解语音。我们报告了重要的第一步:使用非言语肌肉控制元音合成器的可行性。一种基于共振峰的经典语音合成器经过改造,使最低的两个共振峰能够由骨骼肌的表面肌电图进行控制。肌电图信号使用一种非线性贝叶斯滤波算法进行滤波,该算法可提供语音任务所需的高带宽和准确性。前两个共振峰的频率在二维平面上确定点,而元音则是该平面上的目标。我们专注于使用两种直接的肌电图-共振峰映射来测试通过肌控产生可理解英语元音的总体可行性。未来可以测试更多映射以优化可懂度。在10名健康成年人和4名运动障碍型脑瘫患者身上对元音生成进行了测试。在仅10分钟的设备熟悉期后,受试者以伪随机顺序生成了五个英语元音。4名不知情的听众正确识别出的健康成年人所生成元音的比例超过80%,而脑瘫患者所生成元音的这一比例为57%。我们的目标是打造一种具有个性化语调及发音的连续“虚拟语音”,这不仅能恢复严重运动障碍儿童和成人言语的智力内容,还能恢复其社交和情感内容。