• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

你一直在我心中:介绍厨师帽和用于个性化强化学习的COPPER。

You Were Always on My Mind: Introducing Chef's Hat and COPPER for Personalized Reinforcement Learning.

作者信息

Barros Pablo, Bloem Anne C, Hootsmans Inge M, Opheij Lena M, Toebosch Romain H A, Barakova Emilia, Sciutti Alessandra

机构信息

Cognitive Architecture for Collaborative Technologies (CONTACT) Unit Istituto Italiano di Tecnologia, Genova, Italy.

Department of Industrial Design, University of Technology Eindhoven, Eindhoven, Netherlands.

出版信息

Front Robot AI. 2021 Jul 16;8:669990. doi: 10.3389/frobt.2021.669990. eCollection 2021.

DOI:10.3389/frobt.2021.669990
PMID:34336935
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8323774/
Abstract

Reinforcement learning simulation environments pose an important experimental test bed and facilitate data collection for developing AI-based robot applications. Most of them, however, focus on single-agent tasks, which limits their application to the development of social agents. This study proposes the Chef's Hat simulation environment, which implements a multi-agent competitive card game that is a complete reproduction of the homonymous board game, designed to provoke competitive strategies in humans and emotional responses. The game was shown to be ideal for developing personalized reinforcement learning, in an online learning closed-loop scenario, as its state representation is extremely dynamic and directly related to each of the opponent's actions. To adapt current reinforcement learning agents to this scenario, we also developed the COmPetitive Prioritized Experience Replay (COPPER) algorithm. With the help of COPPER and the Chef's Hat simulation environment, we evaluated the following: (1) 12 experimental learning agents, trained four different regimens (self-play, play against a naive baseline, PER, or COPPER) with three algorithms based on different state-of-the-art learning paradigms (PPO, DQN, and ACER), and two "dummy" baseline agents that take random actions, (2) the performance difference between COPPER and PER agents trained using the PPO algorithm and playing against different agents (PPO, DQN, and ACER) or all DQN agents, and (3) human performance when playing against two different collections of agents. Our experiments demonstrate that COPPER helps agents learn to adapt to different types of opponents, improving the performance when compared to off-line learning models. An additional contribution of the study is the formalization of the Chef's Hat competitive game and the implementation of the Chef's Hat Player Club, a collection of trained and assessed agents as an enabler for embedding human competitive strategies in social continual and competitive reinforcement learning.

摘要

强化学习模拟环境提供了一个重要的实验测试平台,并有助于为基于人工智能的机器人应用收集数据。然而,它们中的大多数都专注于单智能体任务,这限制了它们在社交智能体开发中的应用。本研究提出了厨师帽模拟环境,该环境实现了一种多智能体竞争纸牌游戏,它是同名棋盘游戏的完整再现,旨在激发人类的竞争策略和情感反应。该游戏被证明在在线学习闭环场景中非常适合开发个性化强化学习,因为其状态表示极其动态,并且与对手的每一个动作直接相关。为了使当前的强化学习智能体适应这种场景,我们还开发了竞争优先经验回放(COPPER)算法。借助COPPER和厨师帽模拟环境,我们进行了以下评估:(1)12个实验性学习智能体,使用基于不同先进学习范式(PPO、DQN和ACER)的三种算法,按照四种不同的训练方案(自我对战、与朴素基线对战、优先经验回放或COPPER)进行训练,以及两个随机行动的“虚拟”基线智能体;(2)使用PPO算法训练并与不同智能体(PPO、DQN和ACER)或所有DQN智能体对战的COPPER智能体和优先经验回放智能体之间的性能差异;(3)人类与两组不同智能体对战时的表现。我们的实验表明,COPPER有助于智能体学会适应不同类型的对手,与离线学习模型相比提高了性能。该研究的另一个贡献是对厨师帽竞争游戏进行了形式化,并实现了厨师帽玩家俱乐部,这是一组经过训练和评估的智能体集合,作为在社交持续和竞争强化学习中嵌入人类竞争策略的推动者。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a959/8323774/50cacc95a973/frobt-08-669990-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a959/8323774/b5ab4e161c2e/frobt-08-669990-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a959/8323774/045e4318f7f3/frobt-08-669990-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a959/8323774/fa97c02a59ad/frobt-08-669990-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a959/8323774/6f1fa381b9f4/frobt-08-669990-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a959/8323774/0e0459fa3408/frobt-08-669990-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a959/8323774/9beb9f5a84f0/frobt-08-669990-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a959/8323774/384645019744/frobt-08-669990-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a959/8323774/50cacc95a973/frobt-08-669990-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a959/8323774/b5ab4e161c2e/frobt-08-669990-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a959/8323774/045e4318f7f3/frobt-08-669990-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a959/8323774/fa97c02a59ad/frobt-08-669990-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a959/8323774/6f1fa381b9f4/frobt-08-669990-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a959/8323774/0e0459fa3408/frobt-08-669990-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a959/8323774/9beb9f5a84f0/frobt-08-669990-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a959/8323774/384645019744/frobt-08-669990-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a959/8323774/50cacc95a973/frobt-08-669990-g008.jpg

相似文献

1
You Were Always on My Mind: Introducing Chef's Hat and COPPER for Personalized Reinforcement Learning.你一直在我心中:介绍厨师帽和用于个性化强化学习的COPPER。
Front Robot AI. 2021 Jul 16;8:669990. doi: 10.3389/frobt.2021.669990. eCollection 2021.
2
All by Myself: Learning individualized competitive behavior with a contrastive reinforcement learning optimization.独自学习:用对比强化学习优化来学习个性化竞争行为。
Neural Netw. 2022 Jun;150:364-376. doi: 10.1016/j.neunet.2022.03.013. Epub 2022 Mar 18.
3
Enhancing Stability and Performance in Mobile Robot Path Planning with PMR-Dueling DQN Algorithm.基于PMR-决斗深度Q网络算法提升移动机器人路径规划的稳定性与性能
Sensors (Basel). 2024 Feb 27;24(5):1523. doi: 10.3390/s24051523.
4
The 'chef's hat' appearance of the femoral head in cleidocranial dysplasia.锁骨颅骨发育不全中股骨头的“厨师帽”外观。
J Bone Joint Surg Br. 2000 Apr;82(3):404-8. doi: 10.1302/0301-620x.82b3.9919.
5
Multi-robot task allocation in e-commerce RMFS based on deep reinforcement learning.基于深度强化学习的电子商务 RMFS 中的多机器人任务分配。
Math Biosci Eng. 2023 Jan;20(2):1903-1918. doi: 10.3934/mbe.2023087. Epub 2022 Nov 8.
6
Multi-agent reinforcement learning with approximate model learning for competitive games.多智能体强化学习与近似模型学习在竞争性游戏中的应用。
PLoS One. 2019 Sep 11;14(9):e0222215. doi: 10.1371/journal.pone.0222215. eCollection 2019.
7
Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning.深度强化学习中的自定步调和带覆盖惩罚的优先级课程学习。
IEEE Trans Neural Netw Learn Syst. 2018 Jun;29(6):2216-2226. doi: 10.1109/TNNLS.2018.2790981.
8
Deep reinforcement learning for automated radiation adaptation in lung cancer.深度强化学习在肺癌放射自适应中的应用。
Med Phys. 2017 Dec;44(12):6690-6705. doi: 10.1002/mp.12625. Epub 2017 Nov 14.
9
Sigmoid-weighted linear units for neural network function approximation in reinforcement learning.在强化学习中用于神经网络函数逼近的 Sigmoid 加权线性单元。
Neural Netw. 2018 Nov;107:3-11. doi: 10.1016/j.neunet.2017.12.012. Epub 2018 Jan 11.
10
Reinforcement learning and decision making in monkeys during a competitive game.猴子在竞争性游戏中的强化学习与决策
Brain Res Cogn Brain Res. 2004 Dec;22(1):45-58. doi: 10.1016/j.cogbrainres.2004.07.007.

本文引用的文献

1
A Socially Adaptable Framework for Human-Robot Interaction.一种用于人机交互的社会适应性框架。
Front Robot AI. 2020 Oct 19;7:121. doi: 10.3389/frobt.2020.00121. eCollection 2020.
2
Active Inference in OpenAI Gym: A Paradigm for Computational Investigations Into Psychiatric Illness.主动推理在 OpenAI Gym 中的应用:一种用于计算精神病学研究的范例。
Biol Psychiatry Cogn Neurosci Neuroimaging. 2018 Sep;3(9):809-818. doi: 10.1016/j.bpsc.2018.06.010. Epub 2018 Jul 10.
3
Social Cognition for Human-Robot Symbiosis-Challenges and Building Blocks.
人机共生的社会认知——挑战与基石
Front Neurorobot. 2018 Jul 11;12:34. doi: 10.3389/fnbot.2018.00034. eCollection 2018.