Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089, USA.
J Acoust Soc Am. 2013 Aug;134(2):1378-94. doi: 10.1121/1.4812765.
This paper presents a computational approach to derive interpretable movement primitives from speech articulation data. It puts forth a convolutive Nonnegative Matrix Factorization algorithm with sparseness constraints (cNMFsc) to decompose a given data matrix into a set of spatiotemporal basis sequences and an activation matrix. The algorithm optimizes a cost function that trades off the mismatch between the proposed model and the input data against the number of primitives that are active at any given instant. The method is applied to both measured articulatory data obtained through electromagnetic articulography as well as synthetic data generated using an articulatory synthesizer. The paper then describes how to evaluate the algorithm performance quantitatively and further performs a qualitative assessment of the algorithm's ability to recover compositional structure from data. This is done using pseudo ground-truth primitives generated by the articulatory synthesizer based on an Articulatory Phonology frame-work [Browman and Goldstein (1995). "Dynamics and articulatory phonology," in Mind as motion: Explorations in the dynamics of cognition, edited by R. F. Port and T.van Gelder (MIT Press, Cambridge, MA), pp. 175-194]. The results suggest that the proposed algorithm extracts movement primitives from human speech production data that are linguistically interpretable. Such a framework might aid the understanding of longstanding issues in speech production such as motor control and coarticulation.
本文提出了一种从语音发音数据中推导出可解释运动基元的计算方法。它提出了一种具有稀疏约束的卷积非负矩阵分解算法(cNMFsc),将给定的数据矩阵分解为一组时空基序列和一个激活矩阵。该算法优化了一个代价函数,该函数在提出的模型与输入数据之间的不匹配与任何给定时刻活动的基元数量之间进行权衡。该方法应用于通过电磁发音图获得的测量发音数据以及使用发音合成器生成的合成数据。然后,本文描述了如何对算法性能进行定量评估,并进一步对算法从数据中恢复成分结构的能力进行定性评估。这是通过使用基于发音语音学框架的发音合成器生成的伪基元来完成的[Browman 和 Goldstein(1995)。“动态与发音语音学”,Mind as motion:Explorations in the dynamics of cognition,edited by R. F. Port and T.van Gelder(MIT Press,Cambridge,MA),pp. 175-194]。结果表明,所提出的算法从人类言语产生数据中提取出具有语言可解释性的运动基元。这样的框架可能有助于理解言语产生中的长期问题,例如运动控制和协同发音。