Suppr超能文献

无声语音识别作为喉切除患者的替代交流设备

Silent Speech Recognition as an Alternative Communication Device for Persons with Laryngectomy.

作者信息

Meltzner Geoffrey S, Heaton James T, Deng Yunbin, De Luca Gianluca, Roy Serge H, Kline Joshua C

机构信息

VocaliD, Inc. Belmont, MA, 02478, USA.

Harvard Medical School in the Department of Surgery, Massachusetts General Hospital Voice Center, Boston, MA 02114.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2017 Dec;25(12):2386-2398. doi: 10.1109/TASLP.2017.2740000. Epub 2017 Nov 28.

Abstract

Each year thousands of individuals require surgical removal of their larynx (voice box) due to trauma or disease, and thereby require an alternative voice source or assistive device to verbally communicate. Although natural voice is lost after laryngectomy, most muscles controlling speech articulation remain intact. Surface electromyographic (sEMG) activity of speech musculature can be recorded from the neck and face, and used for automatic speech recognition to provide speech-to-text or synthesized speech as an alternative means of communication. This is true even when speech is mouthed or spoken in a silent (subvocal) manner, making it an appropriate communication platform after laryngectomy. In this study, 8 individuals at least 6 months after total laryngectomy were recorded using 8 sEMG sensors on their face (4) and neck (4) while reading phrases constructed from a 2,500-word vocabulary. A unique set of phrases were used for training phoneme-based recognition models for each of the 39 commonly used phonemes in English, and the remaining phrases were used for testing word recognition of the models based on phoneme identification from running speech. Word error rates were on average 10.3% for the full 8-sensor set (averaging 9.5% for the top 4 participants), and 13.6% when reducing the sensor set to 4 locations per individual (n=7). This study provides a compelling proof-of-concept for sEMG-based alaryngeal speech recognition, with the strong potential to further improve recognition performance.

摘要

每年都有成千上万的人因外伤或疾病需要手术切除喉部(喉),因此需要替代的语音源或辅助设备来进行言语交流。尽管喉切除术后会失去自然嗓音,但大多数控制言语发音的肌肉仍保持完好。可以从颈部和面部记录言语肌肉组织的表面肌电图(sEMG)活动,并将其用于自动语音识别,以提供语音转文本或合成语音作为替代的交流方式。即使在以无声(默读)方式口型发音或说话时也是如此,这使其成为喉切除术后合适的交流平台。在本研究中,8名全喉切除术后至少6个月的个体在阅读由2500个单词词汇构成的短语时,使用8个sEMG传感器记录其面部(4个)和颈部(4个)的情况。使用一组独特的短语来训练基于音素的识别模型,用于识别英语中39个常用音素中的每一个,其余短语则用于基于连续语音中的音素识别来测试模型的单词识别。对于完整的8传感器组,单词错误率平均为10.3%(前4名参与者平均为9.5%),当将每个个体的传感器组减少到4个位置时(n = 7),单词错误率为13.6%。本研究为基于sEMG的无喉语音识别提供了令人信服的概念验证,具有进一步提高识别性能的强大潜力。

相似文献

1
Silent Speech Recognition as an Alternative Communication Device for Persons with Laryngectomy.
IEEE/ACM Trans Audio Speech Lang Process. 2017 Dec;25(12):2386-2398. doi: 10.1109/TASLP.2017.2740000. Epub 2017 Nov 28.
2
Development of sEMG sensors and algorithms for silent speech recognition.
J Neural Eng. 2018 Aug;15(4):046031. doi: 10.1088/1741-2552/aac965. Epub 2018 Jun 1.
3
Surface Electromyography-Based Recognition, Synthesis, and Perception of Prosodic Subvocal Speech.
J Speech Lang Hear Res. 2021 Jun 18;64(6S):2134-2153. doi: 10.1044/2021_JSLHR-20-00257. Epub 2021 May 12.
4
Pilot study for a novel and personalized voice restoration device for patients with laryngectomy.
Head Neck. 2020 May;42(5):839-845. doi: 10.1002/hed.26057. Epub 2019 Dec 26.
5
Neck and face surface electromyography for prosthetic voice control after total laryngectomy.
IEEE Trans Neural Syst Rehabil Eng. 2009 Apr;17(2):146-55. doi: 10.1109/TNSRE.2009.2017805. Epub 2009 Mar 16.
6
Isolated word recognition of silent speech using magnetic implants and sensors.
Med Eng Phys. 2010 Dec;32(10):1189-97. doi: 10.1016/j.medengphy.2010.08.011. Epub 2010 Sep 21.
7
Electromyographic control of a hands-free electrolarynx using neck strap muscles.
J Commun Disord. 2009 May-Jun;42(3):211-25. doi: 10.1016/j.jcomdis.2008.12.002. Epub 2009 Jan 19.
8
sEMG-based technology for silent voice recognition.
Comput Biol Med. 2023 Jan;152:106336. doi: 10.1016/j.compbiomed.2022.106336. Epub 2022 Nov 18.
9
Effects of neck dissection and radiotherapy on short-term speech success in voice prosthesis restoration patients.
J Voice. 2011 Mar;25(2):245-8. doi: 10.1016/j.jvoice.2009.10.011. Epub 2010 Feb 26.
10
Prediction of Voice Fundamental Frequency and Intensity from Surface Electromyographic Signals of the Face and Neck.
Vibration. 2022 Dec;5(4):692-710. doi: 10.3390/vibration5040041. Epub 2022 Oct 13.

引用本文的文献

1
Electrode Setup for Electromyography-Based Silent Speech Interfaces: A Pilot Study.
Sensors (Basel). 2025 Jan 28;25(3):781. doi: 10.3390/s25030781.
4
Microwave Speech Recognizer Empowered by a Programmable Metasurface.
Adv Sci (Weinh). 2024 May;11(17):e2309826. doi: 10.1002/advs.202309826. Epub 2024 Feb 21.
5
Future Solutions for Voice Rehabilitation in Laryngectomees: A Review of Technologies Based on Electrophysiological Signals.
Indian J Otolaryngol Head Neck Surg. 2022 Dec;74(Suppl 3):5082-5090. doi: 10.1007/s12070-021-02765-9. Epub 2021 Jul 21.
6
Discernment on assistive technology for the care and support requirements of older adults and differently-abled individuals.
Front Public Health. 2023 Jan 9;10:1030656. doi: 10.3389/fpubh.2022.1030656. eCollection 2022.
7
Prediction of Voice Fundamental Frequency and Intensity from Surface Electromyographic Signals of the Face and Neck.
Vibration. 2022 Dec;5(4):692-710. doi: 10.3390/vibration5040041. Epub 2022 Oct 13.
9
Sequence-to-Sequence Voice Reconstruction for Silent Speech in a Tonal Language.
Brain Sci. 2022 Jun 23;12(7):818. doi: 10.3390/brainsci12070818.
10
'I love you': the first phrase detected from dreams.
Sleep Sci. 2022 Apr-Jun;15(2):149-157. doi: 10.5935/1984-0063.20220035.

本文引用的文献

2
Pattern learning with deep neural networks in EMG-based speech recognition.
Annu Int Conf IEEE Eng Med Biol Soc. 2014;2014:4200-3. doi: 10.1109/EMBC.2014.6944550.
3
Towards personalized speech synthesis for augmentative and alternative communication.
Augment Altern Commun. 2014 Sep;30(3):226-36. doi: 10.3109/07434618.2014.924026. Epub 2014 Jul 15.
4
Signal acquisition and processing techniques for sEMG based silent speech recognition.
Annu Int Conf IEEE Eng Med Biol Soc. 2011;2011:4848-51. doi: 10.1109/IEMBS.2011.6091201.
5
Tracheostomy cannulas and voice prosthesis.
GMS Curr Top Otorhinolaryngol Head Neck Surg. 2009;8:Doc05. doi: 10.3205/cto000057. Epub 2011 Mar 10.
6
EMG-based speech recognition using hidden markov models with global control variables.
IEEE Trans Biomed Eng. 2008 Mar;55(3):930-40. doi: 10.1109/TBME.2008.915658.
7
Multi-stream HMM for EMG-based speech recognition.
Conf Proc IEEE Eng Med Biol Soc. 2004;2004:4389-92. doi: 10.1109/IEMBS.2004.1404221.
9
Impact of aberrant acoustic properties on the perception of sound quality in electrolarynx speech.
J Speech Lang Hear Res. 2005 Aug;48(4):766-79. doi: 10.1044/1092-4388(2005/053).
10
Myo-electric signals to augment speech recognition.
Med Biol Eng Comput. 2001 Jul;39(4):500-4. doi: 10.1007/BF02345373.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验