Suppr超能文献

语音的西格玛-对数正态模型

Sigma-Lognormal Modeling of Speech.

作者信息

Carmona-Duarte C, Ferrer M A, Plamondon R, Gómez-Rodellar A, Gómez-Vilda P

机构信息

Instituto Universitario Para El Desarrollo Tecnológico Y La Innovación en Comunicaciones, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain.

Laboratoire Scribens, Département de Génie Électrique, Polytechnique Montréal, Montreal, QC Canada.

出版信息

Cognit Comput. 2021;13(2):488-503. doi: 10.1007/s12559-020-09803-8. Epub 2021 Feb 7.

Abstract

Human movement studies and analyses have been fundamental in many scientific domains, ranging from neuroscience to education, pattern recognition to robotics, health care to sports, and beyond. Previous speech motor models were proposed to understand how speech movement is produced and how the resulting speech varies when some parameters are changed. However, the inverse approach, in which the muscular response parameters and the subject's age are derived from real continuous speech, is not possible with such models. Instead, in the handwriting field, the kinematic theory of rapid human movements and its associated Sigma-lognormal model have been applied successfully to obtain the muscular response parameters. This work presents a speech kinematics-based model that can be used to study, analyze, and reconstruct complex speech kinematics in a simplified manner. A method based on the kinematic theory of rapid human movements and its associated Sigma-lognormal model are applied to describe and to parameterize the asymptotic impulse response of the neuromuscular networks involved in speech as a response to a neuromotor command. The method used to carry out transformations from formants to a movement observation is also presented. Experiments carried out with the (English) VTR-TIMIT database and the (German) Saarbrucken Voice Database, including people of different ages, with and without laryngeal pathologies, corroborate the link between the extracted parameters and aging, on the one hand, and the proportion between the first and second formants required in applying the kinematic theory of rapid human movements, on the other. The results should drive innovative developments in the modeling and understanding of speech kinematics.

摘要

人体运动研究与分析在许多科学领域都具有基础性作用,涵盖从神经科学到教育、从模式识别到机器人技术、从医疗保健到体育等诸多领域,甚至更为广泛。先前提出的语音运动模型旨在理解语音运动是如何产生的,以及当某些参数改变时所产生的语音会如何变化。然而,对于这类模型而言,从真实的连续语音中推导肌肉反应参数和受试者年龄的反向方法是行不通的。相反,在手写领域,快速人体运动的运动学理论及其相关的西格玛对数正态模型已成功应用于获取肌肉反应参数。本文提出了一种基于语音运动学的模型,该模型可用于以简化方式研究、分析和重建复杂的语音运动学。一种基于快速人体运动的运动学理论及其相关的西格玛对数正态模型的方法被用于描述和参数化参与语音的神经肌肉网络对神经运动指令的渐近脉冲响应。还介绍了用于从共振峰到运动观测进行转换的方法。使用(英语)VTR - TIMIT数据库和(德语)萨尔布吕肯语音数据库进行的实验,包括不同年龄、有或无喉部病变的人群,一方面证实了提取的参数与衰老之间的联系,另一方面证实了应用快速人体运动运动学理论所需的第一和第二共振峰之间的比例关系。这些结果应推动语音运动学建模与理解方面的创新发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5c5/7943521/ebc21391728e/12559_2020_9803_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验