Kröger Bernd J
Department of Phoniatrics, Pedaudiology, and Communication Disorders, RWTH Aachen University, Aachen, Germany.
Front Robot AI. 2022 Mar 8;9:796739. doi: 10.3389/frobt.2022.796739. eCollection 2022.
Modeling speech production and speech articulation is still an evolving research topic. Some current core questions are: What is the underlying (neural) organization for controlling speech articulation? How to model speech articulators like lips and tongue and their movements in an efficient but also biologically realistic way? How to develop high-quality articulatory-acoustic models leading to high-quality articulatory speech synthesis? Thus, on the one hand computer-modeling will help us to unfold underlying biological as well as acoustic-articulatory concepts of speech production and on the other hand further modeling efforts will help us to reach the goal of high-quality articulatory-acoustic speech synthesis based on more detailed knowledge on vocal tract acoustics and speech articulation. Currently, articulatory models are not able to reach the quality level of corpus-based speech synthesis. Moreover, biomechanical and neuromuscular based approaches are complex and still not usable for sentence-level speech synthesis. This paper lists many computer-implemented articulatory models and provides criteria for dividing articulatory models in different categories. A recent major research question, i.e., how to control articulatory models in a neurobiologically adequate manner is discussed in detail. It can be concluded that there is a strong need to further developing articulatory-acoustic models in order to test quantitative neurobiologically based control concepts for speech articulation as well as to uncover the remaining details in human articulatory and acoustic signal generation. Furthermore, these efforts may help us to approach the goal of establishing high-quality articulatory-acoustic as well as neurobiologically grounded speech synthesis.
语音生成和语音清晰度建模仍是一个不断发展的研究课题。当前一些核心问题是:控制语音清晰度的潜在(神经)组织是什么?如何以高效且符合生物学现实的方式对嘴唇和舌头等语音器官及其运动进行建模?如何开发高质量的发音-声学模型以实现高质量的发音语音合成?因此,一方面计算机建模将帮助我们揭示语音生成的潜在生物学以及声学-发音概念,另一方面进一步的建模工作将帮助我们基于对声道声学和语音清晰度更详细的了解,实现高质量发音-声学语音合成的目标。目前,发音模型无法达到基于语料库的语音合成的质量水平。此外,基于生物力学和神经肌肉的方法很复杂,仍然无法用于句子级语音合成。本文列出了许多计算机实现的发音模型,并提供了将发音模型分为不同类别的标准。详细讨论了一个近期的主要研究问题,即如何以神经生物学上合适的方式控制发音模型。可以得出结论,迫切需要进一步开发发音-声学模型,以便测试基于神经生物学定量的语音清晰度控制概念,并揭示人类发音和声学信号生成中剩余的细节。此外,这些努力可能有助于我们实现建立高质量发音-声学以及基于神经生物学的语音合成的目标。