Lammert Adam, Goldstein Louis, Narayanan Shrikanth, Iskarous Khalil
Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, 3710 McClintock Ave., Los Angeles, CA 90089, USA.
Speech Commun. 2013 Jan;55(1):147-161. doi: 10.1016/j.specom.2012.08.001.
We present and evaluate two statistical methods for estimating kinematic relationships of the speech production system: Artificial Neural Networks and Locally-Weighted Regression. The work is motivated by the need to characterize this motor system, with particular focus on estimating differential aspects of kinematics. Kinematic analysis will facilitate progress in a variety of areas, including the nature of speech production goals, articulatory redundancy and, relatedly, acoustic-to-articulatory inversion. Statistical methods must be used to estimate these relationships from data since they are infeasible to express in closed form. Statistical models are optimized and evaluated - using a heldout data validation procedure - on two sets of synthetic speech data. The theoretical and practical advantages of both methods are also discussed. It is shown that both direct and differential kinematics can be estimated with high accuracy, even for complex, nonlinear relationships. Locally-Weighted Regression displays the best overall performance, which may be due to practical advantages in its training procedure. Moreover, accurate estimation can be achieved using only a modest amount of training data, as judged by convergence of performance. The algorithms are also applied to real-time MRI data, and the results are generally consistent with those obtained from synthetic data.
人工神经网络和局部加权回归。这项工作的动机是需要对这个运动系统进行特征描述,特别关注估计运动学的差异方面。运动学分析将促进多个领域的进展,包括语音产生目标的性质、发音冗余以及相关的声学到发音的逆向转换。由于这些关系难以用封闭形式表达,因此必须使用统计方法从数据中估计它们。使用留出数据验证程序在两组合成语音数据上对统计模型进行优化和评估。还讨论了这两种方法的理论和实际优势。结果表明,即使对于复杂的非线性关系,直接运动学和微分运动学都可以高精度地估计。局部加权回归显示出最佳的整体性能,这可能归因于其训练过程中的实际优势。此外,从性能收敛情况判断,仅使用适量的训练数据就能实现准确估计。这些算法还应用于实时MRI数据,结果与从合成数据中获得的结果总体一致。