基于情景记忆的声学-发音反转问题解决方案。

An episodic memory-based solution for the acoustic-to-articulatory inversion problem.

机构信息

Université de Lorraine, Laboratoire Lorrain de Recherche en Informatique et ses Applications, Unité de Recherche Mixte 7503, Vandœuvre-lès-Nancy, F-54506, France.

出版信息

J Acoust Soc Am. 2013 May;133(5):2921-30. doi: 10.1121/1.4798665.

DOI:10.1121/1.4798665

PMID:23654397

Abstract

This paper presents an acoustic-to-articulatory inversion method based on an episodic memory. An episodic memory is an interesting model for two reasons. First, it does not rely on any assumptions about the mapping function but rather it relies on real synchronized acoustic and articulatory data streams. Second, the memory inherently represents the real articulatory dynamics as observed. It is argued that the computational models of episodic memory, as they are usually designed, cannot provide a satisfying solution for the acoustic-to-articulatory inversion problem due to the insufficient quantity of training data. Therefore, an episodic memory is proposed, called generative episodic memory (G-Mem), which is able to produce articulatory trajectories that do not belong to the set of episodes the memory is based on. The generative episodic memory is evaluated using two electromagnetic articulography corpora: one for English and one for French. Comparisons with a codebook-based method and with a classical episodic memory (which is termed concatenative episodic memory) are presented in order to evaluate the proposed generative episodic memory in terms of both its modeling of articulatory dynamics and its generalization capabilities. The results show the effectiveness of the method where an overall root-mean-square error of 1.65 mm and a correlation of 0.71 are obtained for the G-Mem method. They are comparable to those of methods recently proposed.

摘要

本文提出了一种基于情节记忆的声学到发音的反转方法。情节记忆有两个有趣的原因。首先，它不依赖于任何关于映射函数的假设，而是依赖于真实的同步声学和发音数据流。其次，记忆本身表现出如观察到的真实发音动态。有人认为，由于训练数据的数量不足，情节记忆的计算模型通常无法为声发到发音的反转问题提供令人满意的解决方案。因此，提出了一种称为生成式情节记忆（G-Mem）的情节记忆，它能够产生不属于记忆所基于的情节集的发音轨迹。使用两个电磁发音语料库对生成式情节记忆进行了评估：一个用于英语，一个用于法语。为了评估所提出的生成式情节记忆在发音动态建模及其泛化能力方面的性能，与基于码本的方法和经典的情节记忆（称为串联式情节记忆）进行了比较。结果表明了该方法的有效性，其中 G-Mem 方法的整体均方根误差为 1.65 毫米，相关性为 0.71。这些结果与最近提出的方法相当。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于情景记忆的声学-发音反转问题解决方案。

An episodic memory-based solution for the acoustic-to-articulatory inversion problem.

机构信息

出版信息

相似文献

基于情景记忆的声学-发音反转问题解决方案。

An episodic memory-based solution for the acoustic-to-articulatory inversion problem.

机构信息

出版信息

相似文献