Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, Los Angeles, California 90007, USA.
J Acoust Soc Am. 2019 Dec;146(6):4458. doi: 10.1121/1.5139413.
This paper proposes a modular architecture for articulatory synthesis from a gestural specification comprising relatively simple models for the vocal tract, the glottis, aero-acoustics, and articulatory control. The vocal tract module combines a midsagittal statistical analysis articulatory model, derived by factor analysis of air-tissue boundaries in real-time magnetic resonance imaging data, with an αβ model for converting midsagittal section to area function specifications. The aero-acoustics and glottis models were based on a software implementation of classic work by Maeda. The articulatory control module uses dynamical systems, which implement articulatory gestures, to animate the statistical articulatory model, inspired by the task dynamics model. Results on synthesizing vowel-consonant-vowel sequences with plosive consonants, using models that were built on data from, and simulate the behavior of, two different speakers are presented.
本文提出了一种从包含相对简单的声道模型、声门模型、空气声学模型和发音控制模型的手势规范中进行发音合成的模块化体系结构。声道模块结合了基于实时磁共振成像数据中空气-组织边界的因子分析的中矢状统计分析发音模型,以及将中矢状截面转换为面积函数规范的 αβ 模型。空气声学和声门模型基于 Maeda 的经典软件实现。发音控制模块使用动态系统来实现发音手势,从而激发统计发音模型,这受到任务动力学模型的启发。使用基于来自和模拟两个不同说话人行为的数据构建的模型来合成带有爆破音的元音-辅音-元音序列的结果。