Department of Electrical Engineering, University of California, Los Angeles, California 90095, USA.
J Acoust Soc Am. 2011 Apr;129(4):2144-62. doi: 10.1121/1.3514544.
In this paper, a quantitative study of acoustic-to-articulatory inversion for vowel speech sounds by analysis-by-synthesis using the Maeda articulatory model is performed. For chain matrix calculation of vocal tract (VT) acoustics, the chain matrix derivatives with respect to area function are calculated and used in a quasi-Newton method for optimizing articulatory trajectories. The cost function includes a distance measure between natural and synthesized first three formants, and parameter regularization and continuity terms. Calibration of the Maeda model to two speakers, one male and one female, from the University of Wisconsin x-ray microbeam (XRMB) database, using a cost function, is discussed. Model adaptation includes scaling the overall VT and the pharyngeal region and modifying the outer VT outline using measured palate and pharyngeal traces. The inversion optimization is initialized by a fast search of an articulatory codebook, which was pruned using XRMB data to improve inversion results. Good agreement between estimated midsagittal VT outlines and measured XRMB tongue pellet positions was achieved for several vowels and diphthongs for the male speaker, with average pellet-VT outline distances around 0.15 cm, smooth articulatory trajectories, and less than 1% average error in the first three formants.
本文通过使用前田发音模型的分析-综合方法,对元音语音的声学-发音反转进行了定量研究。对于声道(VT)声学的链式矩阵计算,计算了关于面积函数的链式矩阵导数,并将其用于准牛顿法来优化发音轨迹。代价函数包括自然和合成的前三个共振峰之间的距离度量,以及参数正则化和连续性项。讨论了使用代价函数对来自威斯康星大学 X 射线微束(XRMB)数据库的一男一女两位说话者对前田模型的校准。模型自适应包括缩放整体 VT 和咽区,并使用测量的腭和咽迹来修改外部 VT 轮廓。反转优化通过发音代码本的快速搜索来初始化,该代码本使用 XRMB 数据进行了修剪,以提高反转结果。对于男性说话者的几个元音和双元音,估计的中矢状 VT 轮廓与测量的 XRMB 舌丸位置之间的吻合度较好,发音轨迹平滑,前三个共振峰的平均误差小于 1%。