为改进在线对话行为标记对语篇片段语调进行建模

MODELING THE INTONATION OF DISCOURSE SEGMENTS FOR IMPROVED ONLINE DIALOG ACT TAGGING.

作者信息

Vivek Kumar Rangarajan Sridhar, Narayanan Shrikanth, Bangalore Srinivas

机构信息

Speech Analysis and Interpretation Laboratory, University of Southern California, Viterbi School of Engineering,

出版信息

Proc IEEE Int Conf Acoust Speech Signal Process. 2008;4518789:5033-5036. doi: 10.1109/ICASSP.2008.4518789.

DOI:10.1109/ICASSP.2008.4518789

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2614672/

Abstract

Prosody is an important cue for identifying dialog acts. In this paper, we show that modeling the sequence of acoustic-prosodic values as n-gram features with a maximum entropy model for dialog act (DA) tagging can perform better than conventional approaches that use coarse representation of the prosodic contour through acoustic correlates of prosody. We also propose a discriminative framework that exploits preceding context in the form of lexical and prosodic cues from previous discourse segments. Such a scheme facilitates online DA tagging and offers robustness in the decoding process, unlike greedy decoding schemes that can potentially propagate errors. Using only lexical and prosodic cues from 3 previous utterances, we achieve a DA tagging accuracy of 72% compared to the best case scenario with accurate knowledge of previous DA tag, which results in 74% accuracy.

摘要

韵律是识别对话行为的重要线索。在本文中，我们表明，将声学韵律值序列建模为用于对话行为（DA）标记的最大熵模型的n元语法特征，其性能优于传统方法，传统方法通过韵律的声学关联来使用韵律轮廓的粗略表示。我们还提出了一个判别框架，该框架利用来自先前话语片段的词汇和韵律线索形式的先前上下文。与可能传播错误的贪婪解码方案不同，这样的方案有助于在线DA标记，并在解码过程中提供鲁棒性。仅使用来自前三个话语的词汇和韵律线索，我们实现了72%的DA标记准确率，而在准确知道先前DA标记的最佳情况下，准确率为74%。

相似文献

1

MODELING THE INTONATION OF DISCOURSE SEGMENTS FOR IMPROVED ONLINE DIALOG ACT TAGGING.为改进在线对话行为标记对语篇片段语调进行建模

Proc IEEE Int Conf Acoust Speech Signal Process. 2008;4518789:5033-5036. doi: 10.1109/ICASSP.2008.4518789.

2

Exploiting Acoustic and Syntactic Features for Automatic Prosody Labeling in a Maximum Entropy Framework.在最大熵框架下利用声学和句法特征进行自动韵律标注

IEEE Trans Audio Speech Lang Process. 2008;16(4):797-811. doi: 10.1109/TASL.2008.917071.

3

Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence.利用声学、词汇和句法证据进行自动韵律事件检测。

IEEE Trans Audio Speech Lang Process. 2008 Jan;16(1):216-228. doi: 10.1109/TASL.2007.907570.

4

Lexical and prosodic cues in the comprehension of relative certainty.理解相对确定性中的词汇和韵律线索。

J Child Lang. 1993 Feb;20(1):153-67. doi: 10.1017/s030500090000917x.

5

Unilateral brain damage, prosodic comprehension deficits, and the acoustic cues to prosody.单侧脑损伤、韵律理解缺陷与韵律的声学线索

Brain Lang. 1997 Apr;57(2):195-214. doi: 10.1006/brln.1997.1736.

6

Psychoacoustic studies on the processing of vocal interjections: how to disentangle lexical and prosodic information?关于感叹词处理的心理声学研究：如何区分词汇信息和韵律信息？

Prog Brain Res. 2006;156:295-302. doi: 10.1016/S0079-6123(06)56016-9.

7

Can prosody aid the automatic classification of dialog acts in conversational speech?韵律能否辅助实现对话语音中对话行为的自动分类？

Lang Speech. 1998 Jul-Dec;41 ( Pt 3-4):443-92. doi: 10.1177/002383099804100410.

8

Prosodic phrasing in the presence of unambiguous verb information--ERP evidence from German.存在明确动词信息时的韵律短语划分——来自德语的事件相关电位证据

Neuropsychologia. 2016 Jan 29;81:31-49. doi: 10.1016/j.neuropsychologia.2015.11.022. Epub 2015 Nov 30.

9

Prosodic cues for emotion: analysis with discrete characterization of intonation.情感的韵律线索：基于语调离散特征的分析

Speech Prosody. 2014;2014:130-134. doi: 10.21437/SpeechProsody.2014-14.

10

Hierarchical prosody modeling for Mandarin spontaneous speech.层次韵律建模在汉语自然语音中的应用。

J Acoust Soc Am. 2019 Apr;145(4):2576. doi: 10.1121/1.5099263.

本文引用的文献

1

Can prosody aid the automatic classification of dialog acts in conversational speech?韵律能否辅助实现对话语音中对话行为的自动分类？

Lang Speech. 1998 Jul-Dec;41 ( Pt 3-4):443-92. doi: 10.1177/002383099804100410.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

文档翻译

学术文献翻译模型，支持多种主流文档格式。