Signal Analysis and Interpretation Laboratory, Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089, USA.
J Acoust Soc Am. 2011 Jun;129(6):4014-22. doi: 10.1121/1.3573987.
Understanding how the human speech production system is related to the human auditory system has been a perennial subject of inquiry. To investigate the production-perception link, in this paper, a computational analysis has been performed using the articulatory movement data obtained during speech production with concurrently recorded acoustic speech signals from multiple subjects in three different languages: English, Cantonese, and Georgian. The form of articulatory gestures during speech production varies across languages, and this variation is considered to be reflected in the articulatory position and kinematics. The auditory processing of the acoustic speech signal is modeled by a parametric representation of the cochlear filterbank which allows for realizing various candidate filterbank structures by changing the parameter value. Using mathematical communication theory, it is found that the uncertainty about the articulatory gestures in each language is maximally reduced when the acoustic speech signal is represented using the output of a filterbank similar to the empirically established cochlear filterbank in the human auditory system. Possible interpretations of this finding are discussed.
理解人类言语产生系统与人类听觉系统的关系一直是一个长期的研究课题。为了研究产生-感知的联系,本文使用来自三个不同语言(英语、粤语和格鲁吉亚语)的多个主体在言语产生过程中获得的发音运动数据和同时记录的声学言语信号进行了计算分析。言语产生过程中的发音动作形式因语言而异,这种变化被认为反映在发音位置和运动学上。通过对耳蜗滤波器组的参数表示来模拟声学言语信号的听觉处理,通过改变参数值可以实现各种候选滤波器组结构。使用数学通信理论,发现当使用类似于人类听觉系统中经验建立的耳蜗滤波器组的滤波器组的输出来表示声学言语信号时,每种语言的发音动作的不确定性被最大程度地降低。讨论了对这一发现的可能解释。