Suppr超能文献

使用链矩阵和前田发音模型通过分析-综合进行语音声发音反转的研究。

A study of acoustic-to-articulatory inversion of speech by analysis-by-synthesis using chain matrices and the Maeda articulatory model.

机构信息

Department of Electrical Engineering, University of California, Los Angeles, California 90095, USA.

出版信息

J Acoust Soc Am. 2011 Apr;129(4):2144-62. doi: 10.1121/1.3514544.

Abstract

In this paper, a quantitative study of acoustic-to-articulatory inversion for vowel speech sounds by analysis-by-synthesis using the Maeda articulatory model is performed. For chain matrix calculation of vocal tract (VT) acoustics, the chain matrix derivatives with respect to area function are calculated and used in a quasi-Newton method for optimizing articulatory trajectories. The cost function includes a distance measure between natural and synthesized first three formants, and parameter regularization and continuity terms. Calibration of the Maeda model to two speakers, one male and one female, from the University of Wisconsin x-ray microbeam (XRMB) database, using a cost function, is discussed. Model adaptation includes scaling the overall VT and the pharyngeal region and modifying the outer VT outline using measured palate and pharyngeal traces. The inversion optimization is initialized by a fast search of an articulatory codebook, which was pruned using XRMB data to improve inversion results. Good agreement between estimated midsagittal VT outlines and measured XRMB tongue pellet positions was achieved for several vowels and diphthongs for the male speaker, with average pellet-VT outline distances around 0.15 cm, smooth articulatory trajectories, and less than 1% average error in the first three formants.

摘要

本文通过使用前田发音模型的分析-综合方法,对元音语音的声学-发音反转进行了定量研究。对于声道(VT)声学的链式矩阵计算,计算了关于面积函数的链式矩阵导数,并将其用于准牛顿法来优化发音轨迹。代价函数包括自然和合成的前三个共振峰之间的距离度量,以及参数正则化和连续性项。讨论了使用代价函数对来自威斯康星大学 X 射线微束(XRMB)数据库的一男一女两位说话者对前田模型的校准。模型自适应包括缩放整体 VT 和咽区,并使用测量的腭和咽迹来修改外部 VT 轮廓。反转优化通过发音代码本的快速搜索来初始化,该代码本使用 XRMB 数据进行了修剪,以提高反转结果。对于男性说话者的几个元音和双元音,估计的中矢状 VT 轮廓与测量的 XRMB 舌丸位置之间的吻合度较好,发音轨迹平滑,前三个共振峰的平均误差小于 1%。

相似文献

2
Vocal tract representation in the recognition of cerebral palsied speech.声道特征在脑瘫语音识别中的应用。
J Speech Lang Hear Res. 2012 Aug;55(4):1190-207. doi: 10.1044/1092-4388(2011/11-0223). Epub 2012 Jan 23.
7
Articulatory distinctiveness of vowels and consonants: a data-driven approach.元音和辅音的发音区别:一种数据驱动的方法。
J Speech Lang Hear Res. 2013 Oct;56(5):1539-51. doi: 10.1044/1092-4388(2013/12-0030). Epub 2013 Jul 9.

本文引用的文献

5
Model for wave propagation in a lossy vocal tract.有损声道中波传播的模型。
J Acoust Soc Am. 1974 May;55(5):1070-5. doi: 10.1121/1.1914649.
6
Articulatory model for the study of speech production.用于语音产生研究的发音模型。
J Acoust Soc Am. 1973 Apr;53(4):1070-82. doi: 10.1121/1.1913427.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验