Zhang Zhaoyan
Head and Neck Surgery, UCLA School of Medicine, Los Angeles, CA 90095.
Proc Meet Acoust. 2017 Jun 25;30(1). doi: 10.1121/2.0000572. Epub 2017 Sep 20.
While physically-based continuum models of voice production have potential applications in clinical intervention of voice disorders and personalized natural speech synthesis, their current use is limited due to the high computational cost associated with resolving the complex fluid-structure interaction during voice production process. The goal of this study is to summarize our recent efforts in developing a physically-based, computationally-efficient continuum model of voice production toward near real-time applications. The model uses an eigenmode-based formulation of the governing equations, in which vocal fold eigenmodes are used as building blocks to reconstruct more complex vocal fold vibration patterns. Simulations show that a reasonable accuracy in the fundamental frequency, vocal intensity, and selected spectral measures can be reached with the use of the first 100 vocal fold eigenmodes, thus significantly reducing the degrees of freedom of the governing equations (as compared to tens of thousands in finite element models) and computational time. It is expected that for applications in which absolute values are not as essential, even a smaller number of eigenmodes would be acceptable. Examples are provided to demonstrate the capability of the model in modeling large range of voice qualities, natural voice quality change over time, and speech production in general.
虽然基于物理的语音产生连续体模型在语音障碍的临床干预和个性化自然语音合成方面有潜在应用,但由于在语音产生过程中解决复杂的流固相互作用相关的高计算成本,其目前的应用受到限制。本研究的目标是总结我们最近在开发一种基于物理、计算高效的语音产生连续体模型以实现近实时应用方面所做的努力。该模型使用基于本征模的控制方程公式,其中声带本征模被用作构建块来重建更复杂的声带振动模式。模拟表明,使用前100个声带本征模可以在基频、声音强度和选定的频谱测量方面达到合理的精度,从而显著减少控制方程的自由度(与有限元模型中的数万个相比)和计算时间。预计对于绝对值不是那么关键的应用,甚至更少数量的本征模也是可以接受的。提供了示例以展示该模型在模拟大范围语音质量、自然语音质量随时间的变化以及一般语音产生方面的能力。