• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

稀有重大事件下的策略搜索:选择正确的合作伙伴进行合作。

Policy search with rare significant events: Choosing the right partner to cooperate with.

机构信息

Sorbonne Université, CNRS, Institut des Systèmes Intelligents et de Robotique, ISIR, Paris, France.

Institut Jean Nicod, Département d'Études Cognitives, École Normale Supérieure, Paris, France.

出版信息

PLoS One. 2022 Apr 26;17(4):e0266841. doi: 10.1371/journal.pone.0266841. eCollection 2022.

DOI:10.1371/journal.pone.0266841
PMID:35472212
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9041856/
Abstract

This paper focuses on a class of reinforcement learning problems where significant events are rare and limited to a single positive reward per episode. A typical example is that of an agent who has to choose a partner to cooperate with, while a large number of partners are simply not interested in cooperating, regardless of what the agent has to offer. We address this problem in a continuous state and action space with two different kinds of search methods: a gradient policy search method and a direct policy search method using an evolution strategy. We show that when significant events are rare, gradient information is also scarce, making it difficult for policy gradient search methods to find an optimal policy, with or without a deep neural architecture. On the other hand, we show that direct policy search methods are invariant to the rarity of significant events, which is yet another confirmation of the unique role evolutionary algorithms has to play as a reinforcement learning method.

摘要

本文关注一类强化学习问题,其中重大事件很少且每个回合仅限于一个正奖励。一个典型的例子是,一个代理必须选择一个合作伙伴进行合作,而大量的合作伙伴根本不感兴趣合作,无论代理提供什么。我们在具有两种不同搜索方法的连续状态和动作空间中解决了这个问题:一种是梯度策略搜索方法,另一种是使用进化策略的直接策略搜索方法。我们表明,当重大事件很少时,梯度信息也很稀缺,使得策略梯度搜索方法很难找到最佳策略,无论是否使用深度神经网络架构。另一方面,我们表明,直接策略搜索方法对重大事件的稀有性是不变的,这再次证实了进化算法作为强化学习方法所具有的独特作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d152/9041856/121ccd0d1fe0/pone.0266841.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d152/9041856/5d6178018104/pone.0266841.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d152/9041856/80d0a9887811/pone.0266841.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d152/9041856/afa550d658a2/pone.0266841.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d152/9041856/14b8fc4d1670/pone.0266841.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d152/9041856/1ef4f3933467/pone.0266841.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d152/9041856/af1ec361ffa5/pone.0266841.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d152/9041856/121ccd0d1fe0/pone.0266841.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d152/9041856/5d6178018104/pone.0266841.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d152/9041856/80d0a9887811/pone.0266841.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d152/9041856/afa550d658a2/pone.0266841.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d152/9041856/14b8fc4d1670/pone.0266841.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d152/9041856/1ef4f3933467/pone.0266841.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d152/9041856/af1ec361ffa5/pone.0266841.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d152/9041856/121ccd0d1fe0/pone.0266841.g007.jpg

相似文献

1
Policy search with rare significant events: Choosing the right partner to cooperate with.稀有重大事件下的策略搜索:选择正确的合作伙伴进行合作。
PLoS One. 2022 Apr 26;17(4):e0266841. doi: 10.1371/journal.pone.0266841. eCollection 2022.
2
LJIR: Learning Joint-Action Intrinsic Reward in cooperative multi-agent reinforcement learning.LJIR:在合作多智能体强化学习中学习联合行动内在奖励
Neural Netw. 2023 Oct;167:450-459. doi: 10.1016/j.neunet.2023.08.016. Epub 2023 Aug 22.
3
Adaptive Discount Factor for Deep Reinforcement Learning in Continuing Tasks with Uncertainty.具有不确定性的持续任务中用于深度强化学习的自适应折扣因子。
Sensors (Basel). 2022 Sep 25;22(19):7266. doi: 10.3390/s22197266.
4
A Hybrid Online Off-Policy Reinforcement Learning Agent Framework Supported by Transformers.基于 Transformer 的混合在线非策略强化学习代理框架。
Int J Neural Syst. 2023 Dec;33(12):2350065. doi: 10.1142/S012906572350065X. Epub 2023 Oct 20.
5
Discovering diverse solutions in deep reinforcement learning by maximizing state-action-based mutual information.通过最大化基于状态-动作的互信息在深度强化学习中发现多样的解决方案。
Neural Netw. 2022 Aug;152:90-104. doi: 10.1016/j.neunet.2022.04.009. Epub 2022 Apr 16.
6
Curiosity-driven recommendation strategy for adaptive learning via deep reinforcement learning.基于深度强化学习的好奇心驱动推荐策略,用于自适应学习。
Br J Math Stat Psychol. 2020 Nov;73(3):522-540. doi: 10.1111/bmsp.12199. Epub 2020 Feb 21.
7
Deep Reinforcement Learning for the Detection of Abnormal Data in Smart Meters.基于深度强化学习的智能电表异常数据检测
Sensors (Basel). 2022 Nov 6;22(21):8543. doi: 10.3390/s22218543.
8
Reinforcement Learning for Improving Agent Design.强化学习在改进智能体设计中的应用。
Artif Life. 2019 Fall;25(4):352-365. doi: 10.1162/artl_a_00301. Epub 2019 Nov 7.
9
A Collaborative Multiagent Reinforcement Learning Method Based on Policy Gradient Potential.一种基于策略梯度势的协作多智能体强化学习方法。
IEEE Trans Cybern. 2021 Feb;51(2):1015-1027. doi: 10.1109/TCYB.2019.2932203. Epub 2021 Jan 15.
10
Diversity Evolutionary Policy Deep Reinforcement Learning.多样性进化策略深度强化学习。
Comput Intell Neurosci. 2021 Aug 3;2021:5300189. doi: 10.1155/2021/5300189. eCollection 2021.

本文引用的文献

1
Nothing better to do? Environment quality and the evolution of cooperation by partner choice.无所事事?环境质量与通过伙伴选择的合作进化。
J Theor Biol. 2021 Oct 21;527:110805. doi: 10.1016/j.jtbi.2021.110805. Epub 2021 Jun 6.
2
Efficacy of Modern Neuro-Evolutionary Strategies for Continuous Control Optimization.现代神经进化策略用于连续控制优化的效能
Front Robot AI. 2020 Jul 28;7:98. doi: 10.3389/frobt.2020.00098. eCollection 2020.
3
Social efficiency deficit deciphers social dilemmas.社会效率赤字破解社会困境。
Sci Rep. 2020 Sep 30;10(1):16092. doi: 10.1038/s41598-020-72971-y.
4
Policy search in continuous action domains: An overview.连续动作域中的策略搜索:概述。
Neural Netw. 2019 May;113:28-40. doi: 10.1016/j.neunet.2019.01.011. Epub 2019 Feb 5.
5
Non-kin cooperation in bats.蝙蝠中的非亲缘合作。
Philos Trans R Soc Lond B Biol Sci. 2016 Feb 5;371(1687):20150095. doi: 10.1098/rstb.2015.0095.
6
Partner choice creates fairness in humans.伴侣选择在人类中创造公平。
Proc Biol Sci. 2015 Jun 7;282(1808):20150392. doi: 10.1098/rspb.2015.0392.
7
Partner choice promotes cooperation: the two faces of testing with agent-based models.伙伴选择促进合作:基于主体模型测试的两面性。
J Theor Biol. 2014 Mar 7;344:49-55. doi: 10.1016/j.jtbi.2013.11.019. Epub 2013 Dec 4.
8
Partner choice in nitrogen-fixation mutualisms of legumes and rhizobia.豆科植物与根瘤菌固氮共生关系中的伙伴选择。
Integr Comp Biol. 2002 Apr;42(2):369-80. doi: 10.1093/icb/42.2.369.
9
The coevolution of choosiness and cooperation.挑剔与合作的共同进化。
Nature. 2008 Jan 10;451(7175):189-92. doi: 10.1038/nature06455.
10
Completely derandomized self-adaptation in evolution strategies.进化策略中的完全去随机化自适应
Evol Comput. 2001 Summer;9(2):159-95. doi: 10.1162/106365601750190398.