Vivek Kumar Rangarajan Sridhar, Narayanan Shrikanth, Bangalore Srinivas
Speech Analysis and Interpretation Laboratory, University of Southern California, Viterbi School of Engineering,
Proc IEEE Int Conf Acoust Speech Signal Process. 2008;4518789:5033-5036. doi: 10.1109/ICASSP.2008.4518789.
Prosody is an important cue for identifying dialog acts. In this paper, we show that modeling the sequence of acoustic-prosodic values as n-gram features with a maximum entropy model for dialog act (DA) tagging can perform better than conventional approaches that use coarse representation of the prosodic contour through acoustic correlates of prosody. We also propose a discriminative framework that exploits preceding context in the form of lexical and prosodic cues from previous discourse segments. Such a scheme facilitates online DA tagging and offers robustness in the decoding process, unlike greedy decoding schemes that can potentially propagate errors. Using only lexical and prosodic cues from 3 previous utterances, we achieve a DA tagging accuracy of 72% compared to the best case scenario with accurate knowledge of previous DA tag, which results in 74% accuracy.
韵律是识别对话行为的重要线索。在本文中,我们表明,将声学韵律值序列建模为用于对话行为(DA)标记的最大熵模型的n元语法特征,其性能优于传统方法,传统方法通过韵律的声学关联来使用韵律轮廓的粗略表示。我们还提出了一个判别框架,该框架利用来自先前话语片段的词汇和韵律线索形式的先前上下文。与可能传播错误的贪婪解码方案不同,这样的方案有助于在线DA标记,并在解码过程中提供鲁棒性。仅使用来自前三个话语的词汇和韵律线索,我们实现了72%的DA标记准确率,而在准确知道先前DA标记的最佳情况下,准确率为74%。