• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

韵律能否辅助实现对话语音中对话行为的自动分类?

Can prosody aid the automatic classification of dialog acts in conversational speech?

作者信息

Shriberg E, Bates R, Stolcke A, Taylor P, Jurafsky D, Ries K, Coccaro N, Martin R, Meteer M, van Ess-Dykema C

机构信息

SRI International, Menlo Park, CA 94025, USA.

出版信息

Lang Speech. 1998 Jul-Dec;41 ( Pt 3-4):443-92. doi: 10.1177/002383099804100410.

DOI:10.1177/002383099804100410
PMID:10746366
Abstract

Identifying whether an utterance is a statement, question, greeting, and so forth is integral to effective automatic understanding of natural dialog. Little is known, however, about how such dialog acts (DAs) can be automatically classified in truly natural conversation. This study asks whether current approaches, which use mainly word information, could be improved by adding prosodic information. The study is based on more than 1000 conversations from the Switchboard corpus. DAs were hand-annotated, and prosodic features (duration, pause, F0, energy, and speaking rate) were automatically extracted for each DA. In training, decision trees based on these features were inferred; trees were then applied to unseen test data to evaluate performance. Performance was evaluated for prosody models alone, and after combining the prosody models with word information--either from true words or from the output of an automatic speech recognizer. For an overall classification task, as well as three subtasks, prosody made significant contributions to classification. Feature-specific analyses further revealed that although canonical features (such as F0 for questions) were important, less obvious features could compensate if canonical features were removed. Finally, in each task, integrating the prosodic model with a DA-specific statistical language model improved performance over that of the language model alone, especially for the case of recognized words. Results suggest that DAs are redundantly marked in natural conversation, and that a variety of automatically extractable prosodic features could aid dialog processing in speech applications.

摘要

识别一句话是陈述、疑问、问候等对于有效地自动理解自然对话至关重要。然而,对于如何在真正的自然对话中自动分类这些对话行为(DAs),人们了解得很少。本研究探讨了主要使用单词信息的当前方法是否可以通过添加韵律信息得到改进。该研究基于Switchboard语料库中的1000多个对话。对话行为进行了人工标注,并为每个对话行为自动提取了韵律特征(时长、停顿、基频、能量和语速)。在训练中,基于这些特征推断决策树;然后将树应用于未见过的测试数据以评估性能。单独评估韵律模型的性能,并在将韵律模型与单词信息(来自真实单词或自动语音识别器的输出)相结合之后进行评估。对于总体分类任务以及三个子任务,韵律对分类做出了重大贡献。特定特征分析进一步表明,虽然典型特征(如疑问的基频)很重要,但如果去除典型特征,不太明显的特征也可以起到补偿作用。最后,在每个任务中,将韵律模型与特定对话行为的统计语言模型相结合,相比于单独的语言模型提高了性能,特别是对于识别出的单词的情况。结果表明,对话行为在自然对话中被冗余标记,并且各种可自动提取的韵律特征可以帮助语音应用中的对话处理。

相似文献

1
Can prosody aid the automatic classification of dialog acts in conversational speech?韵律能否辅助实现对话语音中对话行为的自动分类?
Lang Speech. 1998 Jul-Dec;41 ( Pt 3-4):443-92. doi: 10.1177/002383099804100410.
2
Intonation and dialog context as constraints for speech recognition.语调与对话语境作为语音识别的制约因素。
Lang Speech. 1998 Jul-Dec;41 ( Pt 3-4):493-512. doi: 10.1177/002383099804100411.
3
An interaction between prosody and statistics in the segmentation of fluent speech.流利言语切分中韵律与统计信息之间的相互作用。
Cogn Psychol. 2007 Feb;54(1):1-32. doi: 10.1016/j.cogpsych.2006.04.002. Epub 2006 Jun 19.
4
Exploring prosody in interaction control.探索交互控制中的韵律
Phonetica. 2005 Apr-Dec;62(2-4):215-26. doi: 10.1159/000090099. Epub 2005 Dec 29.
5
Beyond the particular: prosody and the coordination of actions.超越具体细节:韵律与行动协调
Lang Speech. 2012 Mar;55(Pt 1):13-34. doi: 10.1177/0023830911428871.
6
Reflections on studying prosody in talk-in-interaction.关于在互动谈话中研究韵律的思考。
Lang Speech. 1998 Jul-Dec;41 ( Pt 3-4):235-63. doi: 10.1177/002383099804100402.
7
Effects of compatible versus competing rhythmic grouping on errors and timing variability in speech.兼容与竞争节奏分组对言语错误和时间变异性的影响。
Lang Speech. 2014 Dec;57(Pt 4):544-62. doi: 10.1177/0023830913512776.
8
An analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese map task dialogs.基于日语地图任务对话中韵律和句法特征的话轮转换及反馈语分析。
Lang Speech. 1998 Jul-Dec;41 ( Pt 3-4):295-321. doi: 10.1177/002383099804100404.
9
Prosody leaks into the memories of words.韵律会渗透到单词的记忆中。
Cognition. 2021 May;210:104601. doi: 10.1016/j.cognition.2021.104601. Epub 2021 Jan 25.
10
The Relationship between Prosodic Ability and Conversational Prosodic Entrainment.韵律能力与对话韵律同步之间的关系。
Speech Prosody. 2020 May;2020:769-773. doi: 10.21437/speechprosody.2020-157.

引用本文的文献

1
Structure in conversation: Evidence for the vocabulary, semantics, and syntax of prosody.对话中的结构:韵律的词汇、语义和句法证据。
Proc Natl Acad Sci U S A. 2025 Apr 29;122(17):e2403262122. doi: 10.1073/pnas.2403262122. Epub 2025 Apr 21.
2
Hierarchical temporal structure in music, speech and animal vocalizations: jazz is like a conversation, humpbacks sing like hermit thrushes.音乐、言语和动物发声中的层次时间结构:爵士乐像对话,座头鲸的歌声像画眉鸟。
J R Soc Interface. 2017 Oct;14(135). doi: 10.1098/rsif.2017.0231.
3
Supervised and Unsupervised Feature Selection for Inferring Social Nature of Telephone Conversations from Their Content.
用于从电话对话内容推断其社交性质的监督式和非监督式特征选择
Proc IEEE Workshop Autom Speech Recognit Underst. 2008 Apr 3;1:378-384. doi: 10.1109/ICCV.2003.1238369. Epub 2003 Oct 13.
4
Exploiting Acoustic and Syntactic Features for Automatic Prosody Labeling in a Maximum Entropy Framework.在最大熵框架下利用声学和句法特征进行自动韵律标注
IEEE Trans Audio Speech Lang Process. 2008;16(4):797-811. doi: 10.1109/TASL.2008.917071.
5
AUTOMATIC CLASSIFICATION OF QUESTION TURNS IN SPONTANEOUS SPEECH USING LEXICAL AND PROSODIC EVIDENCE.利用词汇和韵律证据对自发语音中的话轮进行自动分类
Proc IEEE Int Conf Acoust Speech Signal Process. 2008;4518782:5005-5008. doi: 10.1109/ICASSP.2008.4518782.
6
MODELING THE INTONATION OF DISCOURSE SEGMENTS FOR IMPROVED ONLINE DIALOG ACT TAGGING.为改进在线对话行为标记对语篇片段语调进行建模
Proc IEEE Int Conf Acoust Speech Signal Process. 2008;4518789:5033-5036. doi: 10.1109/ICASSP.2008.4518789.
7
Neurobiology of managing perceived stress.应对感知压力的神经生物学
J Natl Med Assoc. 2005 Apr;97(4):583-4.