Raiff Laura, Turashvili Dea, Heaton James T, De Luca Gianluca, Kline Joshua C, Vojtech Jenny
Delsys, Inc., Natick, Massachusetts 01760; Altec, Inc., Natick, Massachusetts 01760.
Delsys, Inc., Natick, Massachusetts 01760; Altec, Inc., Natick, Massachusetts 01760; Department of Biomedical Engineering, Boston University, Boston, Massachusetts, 02215.
J Voice. 2024 Dec 5. doi: 10.1016/j.jvoice.2024.10.024.
People who undergo a total laryngectomy lose their natural voice and depend on alaryngeal technologies for communication. However, these technologies are often difficult to use and lack prosody. Surface electromyographic-based silent speech interfaces are novel communication systems that overcome many of the shortcomings of traditional alaryngeal speech and have the potential to seamlessly incorporate individualized prosody. The purpose of this study was to (1) validate the ability of alaryngeal silent speech to effectively incorporate pitch modulations-a key prosodic element in natural speech-into synthesized speech assessed through listening experiments and (2) determine the key features of these communication devices according to core users.
People with laryngectomy (n = 15) and their primary communication partners (n = 5) listened to synthesized sentences with differing prosodic content generated from deep regression neural networks developed in our prior work. Specifically, the fundamental frequency (f) contour of each sentence was manipulated in four ways: (1) flattened to the average f, (2) altered to discrete sentence-level classification of muscle activity, (3) altered to continuous mapping of muscle activity, and (4) filtered to emulate speech from an electrolarynx (EL). Listeners ranked the f contours of each sentence in terms of speech naturalness and the importance of various speech aid features.
Continuous contours rated higher than all other types of contours, and monotonic EL contours rated the lowest. Speech aid features were rated highest to lowest in the following order: sound quality, intelligibility, pitch, delay, volume, hands-free, maintenance, cost, wearability, training, and visibility.
These results will help inform future development of silent speech interfaces and shape priorities of communication devices toward the preferences of their users.
接受全喉切除术的人会失去自然嗓音,依靠人工喉技术进行交流。然而,这些技术通常使用困难且缺乏韵律。基于表面肌电图的无声语音接口是一种新型通信系统,克服了传统人工喉语音的许多缺点,并有可能无缝融入个性化韵律。本研究的目的是:(1)通过听力实验验证人工喉无声语音将音高调制(自然语音中的关键韵律元素)有效融入合成语音的能力;(2)根据核心用户确定这些通信设备的关键特征。
喉切除患者(n = 15)及其主要交流伙伴(n = 5)听取了由我们之前工作中开发的深度回归神经网络生成的具有不同韵律内容的合成句子。具体而言,每个句子的基频(f)轮廓通过四种方式进行处理:(1)平坦化为平均f;(2)改变为肌肉活动的离散句子级分类;(3)改变为肌肉活动的连续映射;(4)滤波以模拟电子喉(EL)的语音。听众根据语音自然度和各种助听功能的重要性对每个句子的f轮廓进行排序。
连续轮廓的评分高于所有其他类型的轮廓,单调的EL轮廓评分最低。助听功能的评分从高到低依次为:音质、可懂度、音高、延迟、音量、免提、维护、成本、可穿戴性、培训和可见性。
这些结果将有助于为无声语音接口的未来发展提供信息,并根据用户偏好确定通信设备的优先事项。