Suppr超能文献

言语产生过程中的时空发音运动基元:提取、解释和验证。

Spatio-temporal articulatory movement primitives during speech production: extraction, interpretation, and validation.

机构信息

Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089, USA.

出版信息

J Acoust Soc Am. 2013 Aug;134(2):1378-94. doi: 10.1121/1.4812765.

Abstract

This paper presents a computational approach to derive interpretable movement primitives from speech articulation data. It puts forth a convolutive Nonnegative Matrix Factorization algorithm with sparseness constraints (cNMFsc) to decompose a given data matrix into a set of spatiotemporal basis sequences and an activation matrix. The algorithm optimizes a cost function that trades off the mismatch between the proposed model and the input data against the number of primitives that are active at any given instant. The method is applied to both measured articulatory data obtained through electromagnetic articulography as well as synthetic data generated using an articulatory synthesizer. The paper then describes how to evaluate the algorithm performance quantitatively and further performs a qualitative assessment of the algorithm's ability to recover compositional structure from data. This is done using pseudo ground-truth primitives generated by the articulatory synthesizer based on an Articulatory Phonology frame-work [Browman and Goldstein (1995). "Dynamics and articulatory phonology," in Mind as motion: Explorations in the dynamics of cognition, edited by R. F. Port and T.van Gelder (MIT Press, Cambridge, MA), pp. 175-194]. The results suggest that the proposed algorithm extracts movement primitives from human speech production data that are linguistically interpretable. Such a framework might aid the understanding of longstanding issues in speech production such as motor control and coarticulation.

摘要

本文提出了一种从语音发音数据中推导出可解释运动基元的计算方法。它提出了一种具有稀疏约束的卷积非负矩阵分解算法(cNMFsc),将给定的数据矩阵分解为一组时空基序列和一个激活矩阵。该算法优化了一个代价函数,该函数在提出的模型与输入数据之间的不匹配与任何给定时刻活动的基元数量之间进行权衡。该方法应用于通过电磁发音图获得的测量发音数据以及使用发音合成器生成的合成数据。然后,本文描述了如何对算法性能进行定量评估,并进一步对算法从数据中恢复成分结构的能力进行定性评估。这是通过使用基于发音语音学框架的发音合成器生成的伪基元来完成的[Browman 和 Goldstein(1995)。“动态与发音语音学”,Mind as motion:Explorations in the dynamics of cognition,edited by R. F. Port and T.van Gelder(MIT Press,Cambridge,MA),pp. 175-194]。结果表明,所提出的算法从人类言语产生数据中提取出具有语言可解释性的运动基元。这样的框架可能有助于理解言语产生中的长期问题,例如运动控制和协同发音。

相似文献

6
Speech Sound Disorders in Children: An Articulatory Phonology Perspective.儿童语音障碍:发音音系学视角
Front Psychol. 2020 Jan 28;10:2998. doi: 10.3389/fpsyg.2019.02998. eCollection 2019.

引用本文的文献

4
Quantal biomechanical effects in speech postures of the lips.唇音言语姿势的量子生物力学效应。
J Neurophysiol. 2020 Sep 1;124(3):833-843. doi: 10.1152/jn.00676.2019. Epub 2020 Jul 29.
5
Compression of dynamic tactile information in the human hand.人类手部动态触觉信息的压缩
Sci Adv. 2020 Apr 15;6(16):eaaz1158. doi: 10.1126/sciadv.aaz1158. eCollection 2020 Apr.

本文引用的文献

5
A neural basis for motor primitives in the spinal cord.脊髓运动原基的神经基础。
J Neurosci. 2010 Jan 27;30(4):1322-36. doi: 10.1523/JNEUROSCI.5894-08.2010.
6
Synergies: atoms of brain and behavior.协同作用:大脑与行为的基本要素。
Adv Exp Med Biol. 2009;629:83-91. doi: 10.1007/978-0-387-77064-2_5.
8
Combining modules for movement.运动组合模块。
Brain Res Rev. 2008 Jan;57(1):125-33. doi: 10.1016/j.brainresrev.2007.08.004. Epub 2007 Sep 5.
9

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验