Department of Computer Science and Engineering, Indian Institute of Technology Patna, India.
PLoS One. 2020 Jul 2;15(7):e0235367. doi: 10.1371/journal.pone.0235367. eCollection 2020.
Developing a Dialogue/Virtual Agent (VA) that can handle complex tasks (need) of the user pertaining to multiple intents of a domain is challenging as it requires the agent to simultaneously deal with multiple subtasks. However, majority of these end-to-end dialogue systems incorporate only user semantics as inputs in the learning process and ignore other useful user behavior and information. Sentiment of the user at the time of conversation plays an important role in securing maximum user gratification. So, incorporating sentiment of the user during the policy learning becomes even more crucial, more so when serving composite tasks of the user.
As a first step towards enabling the development of sentiment aided VA for multi-intent conversations, this paper proposes a new dataset, annotated with its corresponding intents, slot and sentiment (considering the entire dialogue history) labels, named SentiVA, collected from open-sourced dialogue datasets. In order to integrate these multiple aspects, a Hierarchical Reinforcement Learning (HRL) specifically options based VA is proposed to learn strategies for managing multi-intent conversations. Along with task success based immediate rewards, sentiment based immediate rewards are also incorporated in the hierarchical value functions to make the VA user adaptive.
Empirically, the paper shows that task based and sentiment based immediate rewards cumulatively are required to ensure successful task completion and attain maximum user satisfaction in a multi-intent scenario instead of any of these rewards alone.
The eventual evaluators and consumers of dialogue systems are users. Thus, to ensure a fulfilling conversational experience involving maximum user satisfaction requires VA to consider user sentiment at every time-step in its decision making policy.
This work is the first attempt in incorporating sentiment based rewards in the HRL framework.
开发能够处理用户与领域内多个意图相关的复杂任务的对话/虚拟代理(VA)是具有挑战性的,因为它要求代理同时处理多个子任务。然而,大多数这些端到端对话系统仅将用户语义作为输入纳入学习过程中,而忽略了其他有用的用户行为和信息。用户在对话时的情绪在确保最大程度的用户满意度方面起着重要作用。因此,在策略学习过程中纳入用户的情绪变得更加关键,特别是在为用户的复合任务提供服务时更是如此。
作为为多意图对话开发情感辅助 VA 的第一步,本文提出了一个新的数据集 SentiVA,该数据集使用户的情感(考虑整个对话历史)标签与意图、插槽一起进行了标注,并从开源对话数据集中收集到。为了整合这些多方面的信息,提出了一种分层强化学习(HRL)的特定基于选项的 VA,用于学习管理多意图对话的策略。除了基于任务成功的即时奖励外,还在分层值函数中纳入了基于情感的即时奖励,以使 VA 用户自适应。
从经验上看,本文表明在多意图场景中,需要基于任务和基于情感的即时奖励的累积,以确保任务成功完成并获得最大的用户满意度,而不能仅依靠这些奖励中的任何一种。
对话系统的最终评估者和消费者是用户。因此,为了确保涉及最大用户满意度的令人满意的对话体验,VA 需要在其决策策略中考虑用户的情绪。
这项工作是在 HRL 框架中纳入基于情感的奖励的首次尝试。