• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SAIF:一种用于个人助理的纠错检测深度学习架构。

SAIF: A Correction-Detection Deep-Learning Architecture for Personal Assistants.

机构信息

Data Science Center, Ariel University, Ariel 40700, Israel.

出版信息

Sensors (Basel). 2020 Sep 29;20(19):5577. doi: 10.3390/s20195577.

DOI:10.3390/s20195577
PMID:33003380
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7582502/
Abstract

Intelligent agents that can interact with users using natural language are becoming increasingly common. Sometimes an intelligent agent may not correctly understand a user command or may not perform it properly. In such cases, the user might try a second time by giving the agent another, slightly different command. Giving an agent the ability to detect such user corrections might help it fix its own mistakes and avoid making them in the future. In this work, we consider the problem of automatically detecting user corrections using deep learning. We develop a multimodal architecture called SAIF, which detects such user corrections, taking as inputs the user's voice commands as well as their transcripts. Voice inputs allow SAIF to take advantage of sound cues, such as tone, speed, and word emphasis. In addition to sound cues, our model uses transcripts to determine whether a command is a correction to the previous command. Our model also obtains internal input from the agent, indicating whether the previous command was executed successfully or not. Finally, we release a unique dataset in which users interacted with an intelligent agent assistant, by giving it commands. This dataset includes labels on pairs of consecutive commands, which indicate whether the latter command is in fact a correction of the former command. We show that SAIF outperforms current state-of-the-art methods on this dataset.

摘要

能够使用自然语言与用户进行交互的智能代理越来越普遍。有时,智能代理可能无法正确理解用户的命令,或者无法正确执行。在这种情况下,用户可能会尝试第二次,给代理一个略有不同的命令。赋予代理检测此类用户纠正的能力可能有助于它纠正自己的错误,并避免将来再犯。在这项工作中,我们考虑使用深度学习自动检测用户纠正的问题。我们开发了一种称为 SAIF 的多模态架构,该架构通过输入用户的语音命令及其转录本,检测此类用户纠正。语音输入使 SAIF 能够利用声音线索,如语调、语速和单词强调。除了声音线索外,我们的模型还使用转录本来确定命令是否是对上一个命令的纠正。我们的模型还从代理处获取内部输入,指示上一个命令是否成功执行。最后,我们发布了一个独特的数据集,用户通过向智能代理助手发出命令与该数据集进行交互。该数据集包括连续两条命令的标签,这些标签指示后一条命令是否实际上是前一条命令的纠正。我们表明,SAIF 在这个数据集上的表现优于当前的最先进方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/53b4/7582502/f4160354d224/sensors-20-05577-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/53b4/7582502/f4160354d224/sensors-20-05577-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/53b4/7582502/f4160354d224/sensors-20-05577-g001.jpg

相似文献

1
SAIF: A Correction-Detection Deep-Learning Architecture for Personal Assistants.SAIF:一种用于个人助理的纠错检测深度学习架构。
Sensors (Basel). 2020 Sep 29;20(19):5577. doi: 10.3390/s20195577.
2
FROST: Fallback Voice Apps Recommendation for Unhandled Voice Commands in Intelligent Personal Assistants.弗罗斯特:智能个人助理中未处理语音命令的备用语音应用推荐
Front Big Data. 2022 Apr 25;5:867251. doi: 10.3389/fdata.2022.867251. eCollection 2022.
3
Voice and Touch Based Error-tolerant Multimodal Text Editing and Correction for Smartphones.基于语音和触摸的智能手机容错多模态文本编辑与校正
Proc ACM Symp User Interface Softw Tech. 2021 Oct;2021:162-178. doi: 10.1145/3472749.3474742. Epub 2021 Oct 12.
4
Development of a user-friendly system for image processing of electron microscopy by integrating a web browser and PIONE with Eos.通过将网络浏览器和PIONE与Eos集成,开发一种用户友好的电子显微镜图像处理系统。
Microscopy (Oxf). 2014 Nov;63 Suppl 1:i32-i33. doi: 10.1093/jmicro/dfu070.
5
Consistent benefits for the system designer and the end-user.对系统设计师和终端用户都有持续的益处。
Appl Ergon. 1989 Sep;20(3):160-7. doi: 10.1016/0003-6870(89)90072-0.
6
Two-Stage Voice Application Recommender System for Unhandled Utterances in Intelligent Personal Assistant.智能个人助理中未处理话语的两阶段语音应用推荐系统
Front Big Data. 2022 Jul 18;5:898050. doi: 10.3389/fdata.2022.898050. eCollection 2022.
7
A system for medical consultation and education using multimodal human/machine communication.
IEEE Trans Inf Technol Biomed. 1998 Dec;2(4):282-91. doi: 10.1109/4233.737584.
8
The expert surgical assistant. An intelligent virtual environment with multimodal input.专家手术助手。一个具有多模态输入的智能虚拟环境。
Stud Health Technol Inform. 1996;29:590-607.
9
Reducing Cognitive Load and Improving Warfighter Problem Solving With Intelligent Virtual Assistants.利用智能虚拟助手减轻认知负担并提高作战人员的问题解决能力。
Front Psychol. 2020 Nov 17;11:554706. doi: 10.3389/fpsyg.2020.554706. eCollection 2020.
10
Head-Mounted Sensory Augmentation Device: Designing a Tactile Language.头戴式感官增强设备:设计一种触觉语言。
IEEE Trans Haptics. 2016 Jul-Sep;9(3):376-86. doi: 10.1109/TOH.2016.2554111. Epub 2016 Apr 14.

引用本文的文献

1
Conversational Agents: Goals, Technologies, Vision and Challenges.对话代理:目标、技术、愿景与挑战。
Sensors (Basel). 2021 Dec 17;21(24):8448. doi: 10.3390/s21248448.

本文引用的文献

1
Multimodal Machine Learning: A Survey and Taxonomy.多模态机器学习:一项综述与分类法
IEEE Trans Pattern Anal Mach Intell. 2019 Feb;41(2):423-443. doi: 10.1109/TPAMI.2018.2798607. Epub 2018 Jan 25.
2
The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English.瑞尔森情感语音和歌曲音频视频数据库(RAVDESS):一组具有北美英语特色的动态、多模态面部和声音表情数据集。
PLoS One. 2018 May 16;13(5):e0196391. doi: 10.1371/journal.pone.0196391. eCollection 2018.