Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel.
Sagol Center for Brain and Mind, Interdisciplinary Center, Herzliya, Israel.
PLoS One. 2021 May 3;16(5):e0250969. doi: 10.1371/journal.pone.0250969. eCollection 2021.
Automatic speech recognition (ASR) and natural language processing (NLP) are expected to benefit from an effective, simple, and reliable method to automatically parse conversational speech. The ability to parse conversational speech depends crucially on the ability to identify boundaries between prosodic phrases. This is done naturally by the human ear, yet has proved surprisingly difficult to achieve reliably and simply in an automatic manner. Efforts to date have focused on detecting phrase boundaries using a variety of linguistic and acoustic cues. We propose a method which does not require model training and utilizes two prosodic cues that are based on ASR output. Boundaries are identified using discontinuities in speech rate (pre-boundary lengthening and phrase-initial acceleration) and silent pauses. The resulting phrases preserve syntactic validity, exhibit pitch reset, and compare well with manual tagging of prosodic boundaries. Collectively, our findings support the notion of prosodic phrases that represent coherent patterns across textual and acoustic parameters.
自动语音识别 (ASR) 和自然语言处理 (NLP) 有望受益于一种有效、简单且可靠的方法,以便自动解析会话语音。解析会话语音的能力主要取决于识别韵律短语之间边界的能力。人类的耳朵可以自然地做到这一点,但在自动方式中,这被证明是非常难以可靠且简单地实现的。迄今为止,人们一直致力于使用各种语言和声学线索来检测短语边界。我们提出了一种不需要模型训练的方法,该方法利用了两种基于 ASR 输出的韵律线索。边界是通过语音速度的不连续性(前边界延长和短语起始加速)和无声停顿来识别的。得到的短语保留了句法有效性,表现出音高重置,并且与手动标记的韵律边界相比表现良好。总的来说,我们的发现支持了这样一种观点,即韵律短语代表了跨越文本和声学参数的连贯模式。