• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于分层强化学习的多意图对话中基于情感的对话策略学习。

Towards sentiment aided dialogue policy learning for multi-intent conversations using hierarchical reinforcement learning.

机构信息

Department of Computer Science and Engineering, Indian Institute of Technology Patna, India.

出版信息

PLoS One. 2020 Jul 2;15(7):e0235367. doi: 10.1371/journal.pone.0235367. eCollection 2020.

DOI:10.1371/journal.pone.0235367
PMID:32614929
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7332012/
Abstract

PURPOSE

Developing a Dialogue/Virtual Agent (VA) that can handle complex tasks (need) of the user pertaining to multiple intents of a domain is challenging as it requires the agent to simultaneously deal with multiple subtasks. However, majority of these end-to-end dialogue systems incorporate only user semantics as inputs in the learning process and ignore other useful user behavior and information. Sentiment of the user at the time of conversation plays an important role in securing maximum user gratification. So, incorporating sentiment of the user during the policy learning becomes even more crucial, more so when serving composite tasks of the user.

METHODOLOGY

As a first step towards enabling the development of sentiment aided VA for multi-intent conversations, this paper proposes a new dataset, annotated with its corresponding intents, slot and sentiment (considering the entire dialogue history) labels, named SentiVA, collected from open-sourced dialogue datasets. In order to integrate these multiple aspects, a Hierarchical Reinforcement Learning (HRL) specifically options based VA is proposed to learn strategies for managing multi-intent conversations. Along with task success based immediate rewards, sentiment based immediate rewards are also incorporated in the hierarchical value functions to make the VA user adaptive.

FINDINGS

Empirically, the paper shows that task based and sentiment based immediate rewards cumulatively are required to ensure successful task completion and attain maximum user satisfaction in a multi-intent scenario instead of any of these rewards alone.

PRACTICAL IMPLICATIONS

The eventual evaluators and consumers of dialogue systems are users. Thus, to ensure a fulfilling conversational experience involving maximum user satisfaction requires VA to consider user sentiment at every time-step in its decision making policy.

ORIGINALITY

This work is the first attempt in incorporating sentiment based rewards in the HRL framework.

摘要

目的

开发能够处理用户与领域内多个意图相关的复杂任务的对话/虚拟代理(VA)是具有挑战性的,因为它要求代理同时处理多个子任务。然而,大多数这些端到端对话系统仅将用户语义作为输入纳入学习过程中,而忽略了其他有用的用户行为和信息。用户在对话时的情绪在确保最大程度的用户满意度方面起着重要作用。因此,在策略学习过程中纳入用户的情绪变得更加关键,特别是在为用户的复合任务提供服务时更是如此。

方法

作为为多意图对话开发情感辅助 VA 的第一步,本文提出了一个新的数据集 SentiVA,该数据集使用户的情感(考虑整个对话历史)标签与意图、插槽一起进行了标注,并从开源对话数据集中收集到。为了整合这些多方面的信息,提出了一种分层强化学习(HRL)的特定基于选项的 VA,用于学习管理多意图对话的策略。除了基于任务成功的即时奖励外,还在分层值函数中纳入了基于情感的即时奖励,以使 VA 用户自适应。

发现

从经验上看,本文表明在多意图场景中,需要基于任务和基于情感的即时奖励的累积,以确保任务成功完成并获得最大的用户满意度,而不能仅依靠这些奖励中的任何一种。

实际意义

对话系统的最终评估者和消费者是用户。因此,为了确保涉及最大用户满意度的令人满意的对话体验,VA 需要在其决策策略中考虑用户的情绪。

创新

这项工作是在 HRL 框架中纳入基于情感的奖励的首次尝试。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c9f/7332012/cd366c462dcf/pone.0235367.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c9f/7332012/21da2651a3ea/pone.0235367.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c9f/7332012/9f192311eaa8/pone.0235367.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c9f/7332012/7188c2624c55/pone.0235367.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c9f/7332012/c536a0e5d960/pone.0235367.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c9f/7332012/76c38231c483/pone.0235367.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c9f/7332012/cb96141a7a8e/pone.0235367.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c9f/7332012/45e7e18ee0da/pone.0235367.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c9f/7332012/995812f68a89/pone.0235367.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c9f/7332012/cd366c462dcf/pone.0235367.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c9f/7332012/21da2651a3ea/pone.0235367.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c9f/7332012/9f192311eaa8/pone.0235367.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c9f/7332012/7188c2624c55/pone.0235367.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c9f/7332012/c536a0e5d960/pone.0235367.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c9f/7332012/76c38231c483/pone.0235367.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c9f/7332012/cb96141a7a8e/pone.0235367.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c9f/7332012/45e7e18ee0da/pone.0235367.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c9f/7332012/995812f68a89/pone.0235367.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c9f/7332012/cd366c462dcf/pone.0235367.g009.jpg

相似文献

1
Towards sentiment aided dialogue policy learning for multi-intent conversations using hierarchical reinforcement learning.基于分层强化学习的多意图对话中基于情感的对话策略学习。
PLoS One. 2020 Jul 2;15(7):e0235367. doi: 10.1371/journal.pone.0235367. eCollection 2020.
2
A dynamic goal adapted task oriented dialogue agent.动态目标适应任务导向对话代理。
PLoS One. 2021 Apr 1;16(4):e0249030. doi: 10.1371/journal.pone.0249030. eCollection 2021.
3
An emotion-sensitive dialogue policy for task-oriented dialogue system.面向任务的对话系统中的情感敏感对话策略。
Sci Rep. 2024 Aug 26;14(1):19759. doi: 10.1038/s41598-024-70463-x.
4
Reinforcing personalized persuasion in task-oriented virtual sales assistant.强化面向任务的虚拟销售助手的个性化说服。
PLoS One. 2023 Jan 5;18(1):e0275750. doi: 10.1371/journal.pone.0275750. eCollection 2023.
5
Deep learning for aspect-based sentiment analysis: a review.基于方面的情感分析的深度学习综述
PeerJ Comput Sci. 2022 Jul 19;8:e1044. doi: 10.7717/peerj-cs.1044. eCollection 2022.
6
Learning interaction dynamics with an interactive LSTM for conversational sentiment analysis.通过交互式 LSTM 学习交互动态,用于会话情感分析。
Neural Netw. 2021 Jan;133:40-56. doi: 10.1016/j.neunet.2020.10.001. Epub 2020 Oct 21.
7
Lexicon-enhanced sentiment analysis framework using rule-based classification scheme.使用基于规则分类方案的词汇增强情感分析框架。
PLoS One. 2017 Feb 23;12(2):e0171649. doi: 10.1371/journal.pone.0171649. eCollection 2017.
8
Hierarchical human-like strategy for aspect-level sentiment classification with sentiment linguistic knowledge and reinforcement learning.基于情感语言知识和强化学习的分层类人情感分类策略。
Neural Netw. 2019 Sep;117:240-248. doi: 10.1016/j.neunet.2019.05.021. Epub 2019 Jun 3.
9
Assistive Conversational Agent for Health Coaching: A Validation Study.用于健康指导的辅助对话代理:一项验证研究。
Methods Inf Med. 2019 Jun;58(1):9-23. doi: 10.1055/s-0039-1688757. Epub 2019 May 22.
10
Cross-Domain Recommendation Based on Sentiment Analysis and Latent Feature Mapping.基于情感分析和潜在特征映射的跨域推荐
Entropy (Basel). 2020 Apr 20;22(4):473. doi: 10.3390/e22040473.

引用本文的文献

1
Neural response generation for task completion using conversational knowledge graph.基于会话知识图谱的任务完成的神经响应生成。
PLoS One. 2023 Feb 9;18(2):e0269856. doi: 10.1371/journal.pone.0269856. eCollection 2023.
2
A dynamic goal adapted task oriented dialogue agent.动态目标适应任务导向对话代理。
PLoS One. 2021 Apr 1;16(4):e0249030. doi: 10.1371/journal.pone.0249030. eCollection 2021.

本文引用的文献

1
Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。
Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.
2
The generalisation of student's problems when several different population variances are involved.当涉及几个不同总体方差时学生问题的推广。
Biometrika. 1947;34(1-2):28-35. doi: 10.1093/biomet/34.1-2.28.
3
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.