基于运动学规范的发音合成的模块化架构。

A modular architecture for articulatory synthesis from gestural specification.

机构信息

Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, Los Angeles, California 90007, USA.

出版信息

J Acoust Soc Am. 2019 Dec;146(6):4458. doi: 10.1121/1.5139413.

DOI:10.1121/1.5139413

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7043897/

Abstract

This paper proposes a modular architecture for articulatory synthesis from a gestural specification comprising relatively simple models for the vocal tract, the glottis, aero-acoustics, and articulatory control. The vocal tract module combines a midsagittal statistical analysis articulatory model, derived by factor analysis of air-tissue boundaries in real-time magnetic resonance imaging data, with an αβ model for converting midsagittal section to area function specifications. The aero-acoustics and glottis models were based on a software implementation of classic work by Maeda. The articulatory control module uses dynamical systems, which implement articulatory gestures, to animate the statistical articulatory model, inspired by the task dynamics model. Results on synthesizing vowel-consonant-vowel sequences with plosive consonants, using models that were built on data from, and simulate the behavior of, two different speakers are presented.

摘要

本文提出了一种从包含相对简单的声道模型、声门模型、空气声学模型和发音控制模型的手势规范中进行发音合成的模块化体系结构。声道模块结合了基于实时磁共振成像数据中空气-组织边界的因子分析的中矢状统计分析发音模型，以及将中矢状截面转换为面积函数规范的 αβ 模型。空气声学和声门模型基于 Maeda 的经典软件实现。发音控制模块使用动态系统来实现发音手势，从而激发统计发音模型，这受到任务动力学模型的启发。使用基于来自和模拟两个不同说话人行为的数据构建的模型来合成带有爆破音的元音-辅音-元音序列的结果。

相似文献

1

A modular architecture for articulatory synthesis from gestural specification.

J Acoust Soc Am. 2019 Dec;146(6):4458. doi: 10.1121/1.5139413.

2

Recognizing articulatory gestures from speech for robust speech recognition.

J Acoust Soc Am. 2012 Mar;131(3):2270-87. doi: 10.1121/1.3682038.

3

Modeling consonant-vowel coarticulation for articulatory speech synthesis.

PLoS One. 2013 Apr 16;8(4):e60603. doi: 10.1371/journal.pone.0060603. Print 2013.

4

A procedure for estimating gestural scores from speech acoustics.

J Acoust Soc Am. 2012 Dec;132(6):3980-9. doi: 10.1121/1.4763545.

5

Some notes on syllable structure in articulatory phonology.

Phonetica. 1988;45(2-4):140-55. doi: 10.1159/000261823.

6

Vocal tract normalization for midsagittal articulatory recovery with analysis-by-synthesis.

J Acoust Soc Am. 1999 Aug;106(2):1090-105. doi: 10.1121/1.427117.

7

A study of acoustic-to-articulatory inversion of speech by analysis-by-synthesis using chain matrices and the Maeda articulatory model.

J Acoust Soc Am. 2011 Apr;129(4):2144-62. doi: 10.1121/1.3514544.

8

A modeling investigation of articulatory variability and acoustic stability during American English /r/ production.

J Acoust Soc Am. 2005 May;117(5):3196-212. doi: 10.1121/1.1893271.

9

Gestural Control in the English Past-Tense Suffix: An Articulatory Study Using Real-Time MRI.

Phonetica. 2014;71(4):229-48. doi: 10.1159/000371820. Epub 2015 Apr 1.

10

Characteristics of articulatory gestures in stuttered speech: A case study using real-time magnetic resonance imaging.

J Commun Disord. 2022 May-Jun;97:106213. doi: 10.1016/j.jcomdis.2022.106213. Epub 2022 Mar 18.

引用本文的文献

1

A real-time voice cloning system with multiple algorithms for speech quality improvement.

PLoS One. 2023 Apr 3;18(4):e0283440. doi: 10.1371/journal.pone.0283440. eCollection 2023.

2

A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images.

Sci Data. 2021 Jul 20;8(1):187. doi: 10.1038/s41597-021-00976-x.

本文引用的文献

1

Task-dependence of articulator synergies.

J Acoust Soc Am. 2019 Mar;145(3):1504. doi: 10.1121/1.5093538.

2

Acoustic Denoising using Dictionary Learning with Spectral and Temporal Regularization.

IEEE/ACM Trans Audio Speech Lang Process. 2018 May;26(5):967-980. doi: 10.1109/TASLP.2018.2800280. Epub 2018 Jan 31.

3

Advances in real-time magnetic resonance imaging of the vocal tract for speech science and technology research.

APSIPA Trans Signal Inf Process. 2016;5. doi: 10.1017/ATSIP.2016.5. Epub 2016 Mar 31.

4

A fast and flexible MRI system for the study of dynamic vocal tract shaping.

Magn Reson Med. 2017 Jan;77(1):112-125. doi: 10.1002/mrm.26090. Epub 2016 Jan 17.

5

Modeling the biomechanical influence of epilaryngeal stricture on the vocal folds: a low-dimensional model of vocal-ventricular fold coupling.

J Speech Lang Hear Res. 2014 Apr 1;57(2):S687-704. doi: 10.1044/2014_JSLHR-S-12-0279.

6

Statistical Methods for Estimation of Direct and Differential Kinematics of the Vocal Tract.

Speech Commun. 2013 Jan;55(1):147-161. doi: 10.1016/j.specom.2012.08.001.

7

Modeling consonant-vowel coarticulation for articulatory speech synthesis.

PLoS One. 2013 Apr 16;8(4):e60603. doi: 10.1371/journal.pone.0060603. Print 2013.

8

Phrase-level speech simulation with an airway modulation model of speech production.

Comput Speech Lang. 2013 Jun 1;27(4):989-1010. doi: 10.1016/j.csl.2012.10.005.

9

A theoretical model of the pressure field arising from asymmetric intraglottal flows applied to a two-mass model of the vocal folds.

J Acoust Soc Am. 2011 Jul;130(1):389-403. doi: 10.1121/1.3586785.

10

Region segmentation in the frequency domain applied to upper airway real-time magnetic resonance images.

IEEE Trans Med Imaging. 2009 Mar;28(3):323-38. doi: 10.1109/TMI.2008.928920.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

文档翻译

学术文献翻译模型，支持多种主流文档格式。