Alexandrou Anna Maria, Saarinen Timo, Kujala Jan, Salmelin Riitta
Department of Neuroscience and Biomedical Engineering, Aalto University, FI-00076 AALTO, Finland.
J Acoust Soc Am. 2016 Jan;139(1):215-26. doi: 10.1121/1.4939496.
Human utterances demonstrate temporal patterning, also referred to as rhythm. While simple oromotor behaviors (e.g., chewing) feature a salient periodical structure, conversational speech displays a time-varying quasi-rhythmic pattern. Quantification of periodicity in speech is challenging. Unimodal spectral approaches have highlighted rhythmic aspects of speech. However, speech is a complex multimodal phenomenon that arises from the interplay of articulatory, respiratory, and vocal systems. The present study addressed the question of whether a multimodal spectral approach, in the form of coherence analysis between electromyographic (EMG) and acoustic signals, would allow one to characterize rhythm in natural speech more efficiently than a unimodal analysis. The main experimental task consisted of speech production at three speaking rates; a simple oromotor task served as control. The EMG-acoustic coherence emerged as a sensitive means of tracking speech rhythm, whereas spectral analysis of either EMG or acoustic amplitude envelope alone was less informative. Coherence metrics seem to distinguish and highlight rhythmic structure in natural speech.
人类话语呈现出时间模式,也被称为节奏。虽然简单的口部运动行为(如咀嚼)具有显著的周期性结构,但对话语音表现出随时间变化的准节奏模式。语音周期性的量化具有挑战性。单峰频谱方法突出了语音的节奏方面。然而,语音是一种复杂的多模态现象,它源于发音、呼吸和发声系统的相互作用。本研究探讨了以肌电图(EMG)和声信号之间的相干分析形式的多模态频谱方法是否比单峰分析更有效地表征自然语音中的节奏这一问题。主要实验任务包括以三种语速进行言语产生;一个简单的口部运动任务作为对照。肌电图 - 声学相干性成为追踪语音节奏的一种敏感手段,而单独对肌电图或声幅包络进行频谱分析的信息量较少。相干度量似乎能够区分并突出自然语音中的节奏结构。