基于面部和颈部表面肌电信号预测语音基频和强度

Prediction of Voice Fundamental Frequency and Intensity from Surface Electromyographic Signals of the Face and Neck.

作者信息

Vojtech Jennifer M, Mitchell Claire L, Raiff Laura, Kline Joshua C, De Luca Gianluca

机构信息

Delsys, Inc., Natick, MA 01760, USA.

Altec, Inc., Natick, MA 01760, USA.

出版信息

Vibration. 2022 Dec;5(4):692-710. doi: 10.3390/vibration5040041. Epub 2022 Oct 13.

DOI:10.3390/vibration5040041

PMID:36299552

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9592063/

Abstract

Silent speech interfaces (SSIs) enable speech recognition and synthesis in the absence of an acoustic signal. Yet, the archetypal SSI fails to convey the expressive attributes of prosody such as pitch and loudness, leading to lexical ambiguities. The aim of this study was to determine the efficacy of using surface electromyography (sEMG) as an approach for predicting continuous acoustic estimates of prosody. Ten participants performed a series of vocal tasks including sustained vowels, phrases, and monologues while acoustic data was recorded simultaneously with sEMG activity from muscles of the face and neck. A battery of time-, frequency-, and cepstral-domain features extracted from the sEMG signals were used to train deep regression neural networks to predict fundamental frequency and intensity contours from the acoustic signals. We achieved an average accuracy of 0.01 ST and precision of 0.56 ST for the estimation of fundamental frequency, and an average accuracy of 0.21 dB SPL and precision of 3.25 dB SPL for the estimation of intensity. This work highlights the importance of using sEMG as an alternative means of detecting prosody and shows promise for improving SSIs in future development.

摘要

无声语音接口（SSIs）能够在没有声学信号的情况下实现语音识别与合成。然而，典型的SSI无法传达诸如音高和响度等韵律的表达属性，从而导致词汇歧义。本研究的目的是确定使用表面肌电图（sEMG）作为预测韵律连续声学估计的一种方法的有效性。十名参与者执行了一系列发声任务，包括持续元音、短语和独白，同时记录声学数据以及来自面部和颈部肌肉的sEMG活动。从sEMG信号中提取的一系列时域、频域和倒谱域特征被用于训练深度回归神经网络，以从声学信号中预测基频和强度轮廓。对于基频估计，我们实现了平均精度为0.01 ST，召回率为0.56 ST；对于强度估计，平均精度为0.21 dB SPL，召回率为3.25 dB SPL。这项工作突出了使用sEMG作为检测韵律的替代手段的重要性，并显示出在未来发展中改进SSIs的前景。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f686/9592063/dcc26415b23f/nihms-1843426-f0001.jpg

相似文献

Prediction of Voice Fundamental Frequency and Intensity from Surface Electromyographic Signals of the Face and Neck.基于面部和颈部表面肌电信号预测语音基频和强度

Vibration. 2022 Dec;5(4):692-710. doi: 10.3390/vibration5040041. Epub 2022 Oct 13.

Fundamental Frequency and Intensity Effects on Cepstral Measures in Vowels from Connected Speech of Speakers with Voice Disorders.基频和强度对嗓音障碍者连续言语中元音的倒谱测量的影响。

J Voice. 2021 May;35(3):422-431. doi: 10.1016/j.jvoice.2019.11.014. Epub 2019 Dec 26.

Surface electromyographic (sEMG) activity of the suprahyoid and sternocleidomastoid muscles in pitch and loudness control.舌骨上肌群和胸锁乳突肌在音高和响度控制中的表面肌电图（sEMG）活动。

Front Physiol. 2023 May 4;14:1147795. doi: 10.3389/fphys.2023.1147795. eCollection 2023.

Surface Electromyography-Based Recognition, Synthesis, and Perception of Prosodic Subvocal Speech.基于表面肌电图的韵律性默读语音识别、合成与感知

J Speech Lang Hear Res. 2021 Jun 18;64(6S):2134-2153. doi: 10.1044/2021_JSLHR-20-00257. Epub 2021 May 12.

Speech synthesis from three-axis accelerometer signals using conformer-based deep neural network.基于变形卷积神经网络的三轴加速度计信号语音合成。

Comput Biol Med. 2024 Nov;182:109090. doi: 10.1016/j.compbiomed.2024.109090. Epub 2024 Sep 3.

Development of sEMG sensors and algorithms for silent speech recognition.用于无声语音识别的表面肌电传感器和算法的开发。

J Neural Eng. 2018 Aug;15(4):046031. doi: 10.1088/1741-2552/aac965. Epub 2018 Jun 1.

Establishment of a normative cepstral pediatric acoustic database.建立一个规范性的小儿声学倒谱数据库。

JAMA Otolaryngol Head Neck Surg. 2015 Apr;141(4):358-63. doi: 10.1001/jamaoto.2014.3545.

Silent Speech Recognition as an Alternative Communication Device for Persons with Laryngectomy.无声语音识别作为喉切除患者的替代交流设备

IEEE/ACM Trans Audio Speech Lang Process. 2017 Dec;25(12):2386-2398. doi: 10.1109/TASLP.2017.2740000. Epub 2017 Nov 28.

A Pilot Study on the Performance of Time-Domain Features in Speech Recognition based on high-density sEMG.基于高密度表面肌电信号的语音识别中时域特征性能的初步研究。

Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:19-22. doi: 10.1109/EMBC46164.2021.9630541.

Effects of vocal intensity and vowel type on cepstral analysis of voice.嗓音的声强和母音类型对声道倒频谱分析的影响。

J Voice. 2012 Sep;26(5):670.e15-20. doi: 10.1016/j.jvoice.2011.12.001. Epub 2012 Apr 3.

引用本文的文献

Prosodic Preferences of Surface Electromyography-based Subvocal Speech for People With Laryngectomy.基于表面肌电图的喉切除患者默读语音的韵律偏好

J Voice. 2024 Dec 5. doi: 10.1016/j.jvoice.2024.10.024.

The Characterization of Normal Male and Female Voice from Surface Electromyographic Parameters.基于表面肌电图参数的正常男性和女性嗓音特征分析

J Pers Med. 2024 Jun 1;14(6):592. doi: 10.3390/jpm14060592.

本文引用的文献

Towards Evaluating Pitch-Related Phonation Function in Speech Communication Using High-Density Surface Electromyography.利用高密度表面肌电图评估言语交流中与音高相关的发声功能

Front Neurosci. 2022 Jul 22;16:941594. doi: 10.3389/fnins.2022.941594. eCollection 2022.

The Effect of EMG Features on the Classification of Swallowing Events and the Estimation of Fluid Intake Volume.肌电图特征对吞咽事件分类和液体摄入量估计的影响。

Sensors (Basel). 2022 Apr 28;22(9):3380. doi: 10.3390/s22093380.

Text Data Augmentation for Deep Learning.用于深度学习的文本数据增强

J Big Data. 2021;8(1):101. doi: 10.1186/s40537-021-00492-0. Epub 2021 Jul 19.

Surface Electromyography-Based Recognition, Synthesis, and Perception of Prosodic Subvocal Speech.基于表面肌电图的韵律性默读语音识别、合成与感知

J Speech Lang Hear Res. 2021 Jun 18;64(6S):2134-2153. doi: 10.1044/2021_JSLHR-20-00257. Epub 2021 May 12.

Relationships between vocal pitch perception and production: a developmental perspective.音高感知与产生之间的关系：发展的视角。

Sci Rep. 2020 Mar 3;10(1):3912. doi: 10.1038/s41598-020-60756-2.

Generating Natural, Intelligible Speech From Brain Activity in Motor, Premotor, and Inferior Frontal Cortices.从运动皮层、运动前区皮层和额下回的脑活动中生成自然、可理解的语音。

Front Neurosci. 2019 Nov 22;13:1267. doi: 10.3389/fnins.2019.01267. eCollection 2019.

Evaluation of surface EMG-based recognition algorithms for decoding hand movements.基于表面肌电信号的手运动解码识别算法评估。

Med Biol Eng Comput. 2020 Jan;58(1):83-100. doi: 10.1007/s11517-019-02073-z. Epub 2019 Nov 21.

Speech synthesis from ECoG using densely connected 3D convolutional neural networks.使用密集连接的 3D 卷积神经网络进行脑电信号合成。

J Neural Eng. 2019 Jun;16(3):036019. doi: 10.1088/1741-2552/ab0c59. Epub 2019 Mar 4.

Am J Speech Lang Pathol. 2018 Aug 6;27(3):887-905. doi: 10.1044/2018_AJSLP-17-0009.

Self-expression and identity after total laryngectomy: Implications for support.全喉切除术后的自我表达和身份认同：支持的意义。

Psychooncology. 2018 Nov;27(11):2638-2644. doi: 10.1002/pon.4818. Epub 2018 Jul 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于面部和颈部表面肌电信号预测语音基频和强度

Prediction of Voice Fundamental Frequency and Intensity from Surface Electromyographic Signals of the Face and Neck.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献