稀有重大事件下的策略搜索：选择正确的合作伙伴进行合作。

Policy search with rare significant events: Choosing the right partner to cooperate with.

机构信息

Sorbonne Université, CNRS, Institut des Systèmes Intelligents et de Robotique, ISIR, Paris, France.

Institut Jean Nicod, Département d'Études Cognitives, École Normale Supérieure, Paris, France.

出版信息

PLoS One. 2022 Apr 26;17(4):e0266841. doi: 10.1371/journal.pone.0266841. eCollection 2022.

DOI:10.1371/journal.pone.0266841

PMID:35472212

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9041856/

Abstract

This paper focuses on a class of reinforcement learning problems where significant events are rare and limited to a single positive reward per episode. A typical example is that of an agent who has to choose a partner to cooperate with, while a large number of partners are simply not interested in cooperating, regardless of what the agent has to offer. We address this problem in a continuous state and action space with two different kinds of search methods: a gradient policy search method and a direct policy search method using an evolution strategy. We show that when significant events are rare, gradient information is also scarce, making it difficult for policy gradient search methods to find an optimal policy, with or without a deep neural architecture. On the other hand, we show that direct policy search methods are invariant to the rarity of significant events, which is yet another confirmation of the unique role evolutionary algorithms has to play as a reinforcement learning method.

摘要

本文关注一类强化学习问题，其中重大事件很少且每个回合仅限于一个正奖励。一个典型的例子是，一个代理必须选择一个合作伙伴进行合作，而大量的合作伙伴根本不感兴趣合作，无论代理提供什么。我们在具有两种不同搜索方法的连续状态和动作空间中解决了这个问题：一种是梯度策略搜索方法，另一种是使用进化策略的直接策略搜索方法。我们表明，当重大事件很少时，梯度信息也很稀缺，使得策略梯度搜索方法很难找到最佳策略，无论是否使用深度神经网络架构。另一方面，我们表明，直接策略搜索方法对重大事件的稀有性是不变的，这再次证实了进化算法作为强化学习方法所具有的独特作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d152/9041856/5d6178018104/pone.0266841.g001.jpg

相似文献

Policy search with rare significant events: Choosing the right partner to cooperate with.

PLoS One. 2022 Apr 26;17(4):e0266841. doi: 10.1371/journal.pone.0266841. eCollection 2022.

LJIR: Learning Joint-Action Intrinsic Reward in cooperative multi-agent reinforcement learning.

Neural Netw. 2023 Oct;167:450-459. doi: 10.1016/j.neunet.2023.08.016. Epub 2023 Aug 22.

Adaptive Discount Factor for Deep Reinforcement Learning in Continuing Tasks with Uncertainty.

Sensors (Basel). 2022 Sep 25;22(19):7266. doi: 10.3390/s22197266.

A Hybrid Online Off-Policy Reinforcement Learning Agent Framework Supported by Transformers.

Int J Neural Syst. 2023 Dec;33(12):2350065. doi: 10.1142/S012906572350065X. Epub 2023 Oct 20.

Discovering diverse solutions in deep reinforcement learning by maximizing state-action-based mutual information.

Neural Netw. 2022 Aug;152:90-104. doi: 10.1016/j.neunet.2022.04.009. Epub 2022 Apr 16.

Curiosity-driven recommendation strategy for adaptive learning via deep reinforcement learning.

Br J Math Stat Psychol. 2020 Nov;73(3):522-540. doi: 10.1111/bmsp.12199. Epub 2020 Feb 21.

Deep Reinforcement Learning for the Detection of Abnormal Data in Smart Meters.

Sensors (Basel). 2022 Nov 6;22(21):8543. doi: 10.3390/s22218543.

Reinforcement Learning for Improving Agent Design.

Artif Life. 2019 Fall;25(4):352-365. doi: 10.1162/artl_a_00301. Epub 2019 Nov 7.

A Collaborative Multiagent Reinforcement Learning Method Based on Policy Gradient Potential.

IEEE Trans Cybern. 2021 Feb;51(2):1015-1027. doi: 10.1109/TCYB.2019.2932203. Epub 2021 Jan 15.

Diversity Evolutionary Policy Deep Reinforcement Learning.

Comput Intell Neurosci. 2021 Aug 3;2021:5300189. doi: 10.1155/2021/5300189. eCollection 2021.

本文引用的文献

Nothing better to do? Environment quality and the evolution of cooperation by partner choice.

J Theor Biol. 2021 Oct 21;527:110805. doi: 10.1016/j.jtbi.2021.110805. Epub 2021 Jun 6.

Efficacy of Modern Neuro-Evolutionary Strategies for Continuous Control Optimization.

Front Robot AI. 2020 Jul 28;7:98. doi: 10.3389/frobt.2020.00098. eCollection 2020.

Social efficiency deficit deciphers social dilemmas.

Sci Rep. 2020 Sep 30;10(1):16092. doi: 10.1038/s41598-020-72971-y.

Policy search in continuous action domains: An overview.

Neural Netw. 2019 May;113:28-40. doi: 10.1016/j.neunet.2019.01.011. Epub 2019 Feb 5.

Non-kin cooperation in bats.

Philos Trans R Soc Lond B Biol Sci. 2016 Feb 5;371(1687):20150095. doi: 10.1098/rstb.2015.0095.

Partner choice creates fairness in humans.

Proc Biol Sci. 2015 Jun 7;282(1808):20150392. doi: 10.1098/rspb.2015.0392.

Partner choice promotes cooperation: the two faces of testing with agent-based models.

J Theor Biol. 2014 Mar 7;344:49-55. doi: 10.1016/j.jtbi.2013.11.019. Epub 2013 Dec 4.

Partner choice in nitrogen-fixation mutualisms of legumes and rhizobia.

Integr Comp Biol. 2002 Apr;42(2):369-80. doi: 10.1093/icb/42.2.369.

The coevolution of choosiness and cooperation.

Nature. 2008 Jan 10;451(7175):189-92. doi: 10.1038/nature06455.

Completely derandomized self-adaptation in evolution strategies.

Evol Comput. 2001 Summer;9(2):159-95. doi: 10.1162/106365601750190398.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

稀有重大事件下的策略搜索：选择正确的合作伙伴进行合作。

Policy search with rare significant events: Choosing the right partner to cooperate with.

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献