Suppr超能文献

使用链矩阵和前田发音模型通过分析-综合进行语音声发音反转的研究。

A study of acoustic-to-articulatory inversion of speech by analysis-by-synthesis using chain matrices and the Maeda articulatory model.

机构信息

Department of Electrical Engineering, University of California, Los Angeles, California 90095, USA.

出版信息

J Acoust Soc Am. 2011 Apr;129(4):2144-62. doi: 10.1121/1.3514544.

Abstract

In this paper, a quantitative study of acoustic-to-articulatory inversion for vowel speech sounds by analysis-by-synthesis using the Maeda articulatory model is performed. For chain matrix calculation of vocal tract (VT) acoustics, the chain matrix derivatives with respect to area function are calculated and used in a quasi-Newton method for optimizing articulatory trajectories. The cost function includes a distance measure between natural and synthesized first three formants, and parameter regularization and continuity terms. Calibration of the Maeda model to two speakers, one male and one female, from the University of Wisconsin x-ray microbeam (XRMB) database, using a cost function, is discussed. Model adaptation includes scaling the overall VT and the pharyngeal region and modifying the outer VT outline using measured palate and pharyngeal traces. The inversion optimization is initialized by a fast search of an articulatory codebook, which was pruned using XRMB data to improve inversion results. Good agreement between estimated midsagittal VT outlines and measured XRMB tongue pellet positions was achieved for several vowels and diphthongs for the male speaker, with average pellet-VT outline distances around 0.15 cm, smooth articulatory trajectories, and less than 1% average error in the first three formants.

摘要

本文通过使用前田发音模型的分析-综合方法,对元音语音的声学-发音反转进行了定量研究。对于声道(VT)声学的链式矩阵计算,计算了关于面积函数的链式矩阵导数,并将其用于准牛顿法来优化发音轨迹。代价函数包括自然和合成的前三个共振峰之间的距离度量,以及参数正则化和连续性项。讨论了使用代价函数对来自威斯康星大学 X 射线微束(XRMB)数据库的一男一女两位说话者对前田模型的校准。模型自适应包括缩放整体 VT 和咽区,并使用测量的腭和咽迹来修改外部 VT 轮廓。反转优化通过发音代码本的快速搜索来初始化,该代码本使用 XRMB 数据进行了修剪,以提高反转结果。对于男性说话者的几个元音和双元音,估计的中矢状 VT 轮廓与测量的 XRMB 舌丸位置之间的吻合度较好,发音轨迹平滑,前三个共振峰的平均误差小于 1%。

相似文献

2
Vocal tract representation in the recognition of cerebral palsied speech.
J Speech Lang Hear Res. 2012 Aug;55(4):1190-207. doi: 10.1044/1092-4388(2011/11-0223). Epub 2012 Jan 23.
3
Vocal tract normalization for midsagittal articulatory recovery with analysis-by-synthesis.
J Acoust Soc Am. 1999 Aug;106(2):1090-105. doi: 10.1121/1.427117.
4
A modular architecture for articulatory synthesis from gestural specification.
J Acoust Soc Am. 2019 Dec;146(6):4458. doi: 10.1121/1.5139413.
6
Acquisition of vowel articulation in childhood investigated by acoustic-to-articulatory inversion.
Infant Behav Dev. 2017 Feb;46:178-193. doi: 10.1016/j.infbeh.2017.01.007. Epub 2017 Feb 20.
7
Articulatory distinctiveness of vowels and consonants: a data-driven approach.
J Speech Lang Hear Res. 2013 Oct;56(5):1539-51. doi: 10.1044/1092-4388(2013/12-0030). Epub 2013 Jul 9.
8
Variability of articulator positions and formants across nine English vowels.
J Phon. 2018 May;68:1-14. doi: 10.1016/j.wocn.2018.01.003. Epub 2018 Feb 23.
9
Modeling the effect of palate shape on the articulatory-acoustics mapping.
J Acoust Soc Am. 2018 Jul;144(1):EL71. doi: 10.1121/1.5048043.
10
An evaluation of articulatory working space area in vowel production of adults with Down syndrome.
Clin Linguist Phon. 2011 Apr;25(4):321-34. doi: 10.3109/02699206.2010.535647. Epub 2010 Nov 22.

引用本文的文献

1
Discrete constriction locations describe a comprehensive range of vocal tract shapes in the Maeda model.
JASA Express Lett. 2021 Dec;1(12):124402. doi: 10.1121/10.0009058. Epub 2021 Dec 28.
2
Pathological Voice Source Analysis System Using a Flow Waveform-Matched Biomechanical Model.
Appl Bionics Biomech. 2018 Jul 2;2018:3158439. doi: 10.1155/2018/3158439. eCollection 2018.
3
Statistical Methods for Estimation of Direct and Differential Kinematics of the Vocal Tract.
Speech Commun. 2013 Jan;55(1):147-161. doi: 10.1016/j.specom.2012.08.001.

本文引用的文献

1
Incorporation of phonetic constraints in acoustic-to-articulatory inversion.
J Acoust Soc Am. 2008 Apr;123(4):2310-23. doi: 10.1121/1.2885747.
3
Determination of the vocal-tract shape from measured formant frequencies.
J Acoust Soc Am. 1967 May;41(5):1283-94. doi: 10.1121/1.1910470.
4
Determination of the geometry of the human vocal tract by acoustic measurements.
J Acoust Soc Am. 1967 Apr;41(4):Suppl:1002-10. doi: 10.1121/1.1910429.
5
Model for wave propagation in a lossy vocal tract.
J Acoust Soc Am. 1974 May;55(5):1070-5. doi: 10.1121/1.1914649.
6
Articulatory model for the study of speech production.
J Acoust Soc Am. 1973 Apr;53(4):1070-82. doi: 10.1121/1.1913427.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验