Barros Pablo, Bloem Anne C, Hootsmans Inge M, Opheij Lena M, Toebosch Romain H A, Barakova Emilia, Sciutti Alessandra
Cognitive Architecture for Collaborative Technologies (CONTACT) Unit Istituto Italiano di Tecnologia, Genova, Italy.
Department of Industrial Design, University of Technology Eindhoven, Eindhoven, Netherlands.
Front Robot AI. 2021 Jul 16;8:669990. doi: 10.3389/frobt.2021.669990. eCollection 2021.
Reinforcement learning simulation environments pose an important experimental test bed and facilitate data collection for developing AI-based robot applications. Most of them, however, focus on single-agent tasks, which limits their application to the development of social agents. This study proposes the Chef's Hat simulation environment, which implements a multi-agent competitive card game that is a complete reproduction of the homonymous board game, designed to provoke competitive strategies in humans and emotional responses. The game was shown to be ideal for developing personalized reinforcement learning, in an online learning closed-loop scenario, as its state representation is extremely dynamic and directly related to each of the opponent's actions. To adapt current reinforcement learning agents to this scenario, we also developed the COmPetitive Prioritized Experience Replay (COPPER) algorithm. With the help of COPPER and the Chef's Hat simulation environment, we evaluated the following: (1) 12 experimental learning agents, trained four different regimens (self-play, play against a naive baseline, PER, or COPPER) with three algorithms based on different state-of-the-art learning paradigms (PPO, DQN, and ACER), and two "dummy" baseline agents that take random actions, (2) the performance difference between COPPER and PER agents trained using the PPO algorithm and playing against different agents (PPO, DQN, and ACER) or all DQN agents, and (3) human performance when playing against two different collections of agents. Our experiments demonstrate that COPPER helps agents learn to adapt to different types of opponents, improving the performance when compared to off-line learning models. An additional contribution of the study is the formalization of the Chef's Hat competitive game and the implementation of the Chef's Hat Player Club, a collection of trained and assessed agents as an enabler for embedding human competitive strategies in social continual and competitive reinforcement learning.
强化学习模拟环境提供了一个重要的实验测试平台,并有助于为基于人工智能的机器人应用收集数据。然而,它们中的大多数都专注于单智能体任务,这限制了它们在社交智能体开发中的应用。本研究提出了厨师帽模拟环境,该环境实现了一种多智能体竞争纸牌游戏,它是同名棋盘游戏的完整再现,旨在激发人类的竞争策略和情感反应。该游戏被证明在在线学习闭环场景中非常适合开发个性化强化学习,因为其状态表示极其动态,并且与对手的每一个动作直接相关。为了使当前的强化学习智能体适应这种场景,我们还开发了竞争优先经验回放(COPPER)算法。借助COPPER和厨师帽模拟环境,我们进行了以下评估:(1)12个实验性学习智能体,使用基于不同先进学习范式(PPO、DQN和ACER)的三种算法,按照四种不同的训练方案(自我对战、与朴素基线对战、优先经验回放或COPPER)进行训练,以及两个随机行动的“虚拟”基线智能体;(2)使用PPO算法训练并与不同智能体(PPO、DQN和ACER)或所有DQN智能体对战的COPPER智能体和优先经验回放智能体之间的性能差异;(3)人类与两组不同智能体对战时的表现。我们的实验表明,COPPER有助于智能体学会适应不同类型的对手,与离线学习模型相比提高了性能。该研究的另一个贡献是对厨师帽竞争游戏进行了形式化,并实现了厨师帽玩家俱乐部,这是一组经过训练和评估的智能体集合,作为在社交持续和竞争强化学习中嵌入人类竞争策略的推动者。