• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

语调与对话语境作为语音识别的制约因素。

Intonation and dialog context as constraints for speech recognition.

作者信息

Taylor P, King S, Isard S, Wright H

机构信息

Center for Speech Technology Research, University of Edinburgh, U.K.

出版信息

Lang Speech. 1998 Jul-Dec;41 ( Pt 3-4):493-512. doi: 10.1177/002383099804100411.

DOI:10.1177/002383099804100411
PMID:10746367
Abstract

This paper describes a way of using intonation and dialog context to improve the performance of an automatic speech recognition (ASR) system. Our experiments were run on the DCIEM Maptask corpus, a corpus of spontaneous task-oriented dialog speech. This corpus has been tagged according to a dialog analysis scheme that assigns each utterance to one of 12 "move types," such as "acknowledge," "query-yes/no" or "instruct." Most ASR systems use a bigram language model to constrain the possible sequences of words that might be recognized. Here we use a separate bigram language model for each move type. We show that when the "correct" move-specific language model is used for each utterance in the test set, the word error rate of the recognizer drops. Of course when the recognizer is run on previously unseen data, it cannot know in advance what move type the speaker has just produced. To determine the move type we use an intonation model combined with a dialog model that puts constraints on possible sequences of move types, as well as the speech recognizer likelihoods for the different move-specific models. In the full recognition system, the combination of automatic move type recognition with the move specific language models reduces the overall word error rate by a small but significant amount when compared with a baseline system that does not take intonation or dialog acts into account. Interestingly, the word error improvement is restricted to "initiating" move types, where word recognition is important. In "response" move types, where the important information is conveyed by the move type itself--for example, positive versus negative response--there is no word error improvement, but recognition of the response types themselves is good. The paper discusses the intonation model, the language models, and the dialog model in detail and describes the architecture in which they are combined.

摘要

本文描述了一种利用语调及对话上下文来提高自动语音识别(ASR)系统性能的方法。我们的实验是在DCIEM Maptask语料库上进行的,该语料库是一个面向任务的自发对话语音语料库。这个语料库已根据一种对话分析方案进行了标注,该方案将每个话语分配到12种“动作类型”中的一种,如“确认”“是/否询问”或“指示”。大多数ASR系统使用二元语言模型来约束可能被识别的单词序列。在这里,我们为每种动作类型使用一个单独的二元语言模型。我们表明,当在测试集中的每个话语使用“正确的”特定动作语言模型时,识别器的单词错误率会下降。当然,当识别器在之前未见过的数据上运行时,它无法提前知道说话者刚刚产生的动作类型。为了确定动作类型,我们使用一个语调模型与一个对话模型相结合,该对话模型对动作类型的可能序列以及不同特定动作模型的语音识别器似然性施加约束。在完整的识别系统中,与不考虑语调或对话行为的基线系统相比,自动动作类型识别与特定动作语言模型的结合使总体单词错误率有小幅但显著的降低。有趣的是,单词错误率的改善仅限于“发起”动作类型,在这些类型中单词识别很重要。在“回应”动作类型中,重要信息由动作类型本身传达——例如,肯定与否定回应——单词错误率没有改善,但对回应类型本身的识别很好。本文详细讨论了语调模型、语言模型和对话模型,并描述了它们相结合的架构。

相似文献

1
Intonation and dialog context as constraints for speech recognition.语调与对话语境作为语音识别的制约因素。
Lang Speech. 1998 Jul-Dec;41 ( Pt 3-4):493-512. doi: 10.1177/002383099804100411.
2
Can prosody aid the automatic classification of dialog acts in conversational speech?韵律能否辅助实现对话语音中对话行为的自动分类?
Lang Speech. 1998 Jul-Dec;41 ( Pt 3-4):443-92. doi: 10.1177/002383099804100410.
3
The interaction of lexical tone, intonation and semantic context in on-line spoken word recognition: an ERP study on Cantonese Chinese.词汇音、语调与语义语境在在线口语词识别中的交互作用:一项关于粤语的 ERP 研究。
Neuropsychologia. 2014 Jan;53:293-309. doi: 10.1016/j.neuropsychologia.2013.11.020. Epub 2013 Dec 4.
4
Production and perception of speech intonation in pediatric cochlear implant recipients and individuals with normal hearing.人工耳蜗植入儿童及听力正常个体的言语语调产生与感知
Ear Hear. 2008 Jun;29(3):336-51. doi: 10.1097/AUD.0b013e318168d94d.
5
An interaction between prosody and statistics in the segmentation of fluent speech.流利言语切分中韵律与统计信息之间的相互作用。
Cogn Psychol. 2007 Feb;54(1):1-32. doi: 10.1016/j.cogpsych.2006.04.002. Epub 2006 Jun 19.
6
Effects of compatible versus competing rhythmic grouping on errors and timing variability in speech.兼容与竞争节奏分组对言语错误和时间变异性的影响。
Lang Speech. 2014 Dec;57(Pt 4):544-62. doi: 10.1177/0023830913512776.
7
Variation in the speech signal as a window into the cognitive architecture of language production.言语信号的变化揭示了语言产生的认知结构。
Psychon Bull Rev. 2018 Dec;25(6):1973-2004. doi: 10.3758/s13423-017-1423-4.
8
A comparison of automatic and human speech recognition in null grammar.自动语音识别与零语法下的人工语音识别比较。
J Acoust Soc Am. 2012 Mar;131(3):EL256-61. doi: 10.1121/1.3684744.
9
Effects of disfluencies, predictability, and utterance position on word form variation in English conversation.不流畅性、可预测性及话语位置对英语会话中单词形式变化的影响
J Acoust Soc Am. 2003 Feb;113(2):1001-24. doi: 10.1121/1.1534836.
10
Pitch cues for the recognition of yes-no questions in French.用于识别法语中是非疑问句的音高线索。
J Psycholinguist Res. 2006 Sep;35(5):427-45. doi: 10.1007/s10936-006-9023-x.

引用本文的文献

1
Exploiting Acoustic and Syntactic Features for Automatic Prosody Labeling in a Maximum Entropy Framework.在最大熵框架下利用声学和句法特征进行自动韵律标注
IEEE Trans Audio Speech Lang Process. 2008;16(4):797-811. doi: 10.1109/TASL.2008.917071.