• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

游戏中的学习、利用和偏见。

Learning, exploitation and bias in games.

机构信息

School of Mathematics, University of Bristol, Bristol, United Kingdom.

School of Biological Sciences, University of Bristol, Bristol, United Kingdom.

出版信息

PLoS One. 2021 Feb 5;16(2):e0246588. doi: 10.1371/journal.pone.0246588. eCollection 2021.

DOI:10.1371/journal.pone.0246588
PMID:33544782
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7864454/
Abstract

We focus on learning during development in a group of individuals that play a competitive game with each other. The game has two actions and there is negative frequency dependence. We define the distribution of actions by group members to be an equilibrium configuration if no individual can improve its payoff by unilaterally changing its action. We show that at this equilibrium, one action is preferred in the sense that those taking the preferred action have a higher payoff than those taking the other, more prosocial, action. We explore the consequences of a simple 'unbiased' reinforcement learning rule during development, showing that groups reach an approximate equilibrium distribution, so that some achieve a higher payoff than others. Because there is learning, an individual's behaviour can influence the future behaviour of others. We show that, as a consequence, there is the potential for an individual to exploit others by influencing them to be the ones to take the non-preferred action. Using an evolutionary simulation, we show that population members can avoid being exploited by over-valuing rewards obtained from the preferred option during learning, an example of a bias that is 'rational'.

摘要

我们专注于在一群相互竞争的个体中学习发展。游戏有两个动作,存在负频率依赖。如果没有个体可以通过单方面改变其动作来提高其收益,则我们将成员的动作分布定义为均衡配置。我们表明,在这种均衡下,一种动作更受欢迎,因为采取首选动作的人比采取另一种更有利于社会的动作的人获得更高的回报。我们探讨了在发展过程中简单的“无偏”强化学习规则的后果,表明群体达到了近似的均衡分布,因此一些人比其他人获得了更高的回报。由于存在学习,个体的行为可以影响他人的未来行为。我们表明,因此,个体有可能通过影响他人采取非首选动作来剥削他人。使用进化模拟,我们表明,在学习过程中,成员可以通过高估从首选选项获得的奖励来避免被剥削,这是一种“理性”的偏见。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e139/7864454/bd3e61e28c95/pone.0246588.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e139/7864454/99c44d5033ac/pone.0246588.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e139/7864454/e28562a999bc/pone.0246588.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e139/7864454/64daa6d997b3/pone.0246588.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e139/7864454/a55fe4784518/pone.0246588.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e139/7864454/bd3e61e28c95/pone.0246588.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e139/7864454/99c44d5033ac/pone.0246588.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e139/7864454/e28562a999bc/pone.0246588.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e139/7864454/64daa6d997b3/pone.0246588.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e139/7864454/a55fe4784518/pone.0246588.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e139/7864454/bd3e61e28c95/pone.0246588.g005.jpg

相似文献

1
Learning, exploitation and bias in games.游戏中的学习、利用和偏见。
PLoS One. 2021 Feb 5;16(2):e0246588. doi: 10.1371/journal.pone.0246588. eCollection 2021.
2
Peer Play同伴互动游戏
3
Payoff-based learning explains the decline in cooperation in public goods games.基于回报的学习解释了公共物品博弈中合作行为的减少。
Proc Biol Sci. 2015 Feb 22;282(1801):20142678. doi: 10.1098/rspb.2014.2678.
4
Transfer of conflict and cooperation from experienced games to new games: a connectionist model of learning.冲突与合作从经验性游戏到新游戏的迁移:一种学习的联结主义模型
Front Neurosci. 2015 Mar 31;9:102. doi: 10.3389/fnins.2015.00102. eCollection 2015.
5
Conformist social learning leads to self-organised prevention against adverse bias in risky decision making.从众的社会学习导致了自我组织的预防措施,以避免在风险决策中出现不利偏见。
Elife. 2022 May 10;11:e75308. doi: 10.7554/eLife.75308.
6
Payoff-based learning best explains the rate of decline in cooperation across 237 public-goods games.基于回报的学习最能解释 237 个公共物品博弈中合作率下降的原因。
Nat Hum Behav. 2021 Oct;5(10):1330-1338. doi: 10.1038/s41562-021-01107-7. Epub 2021 May 3.
7
Learning leads to bounded rationality and the evolution of cognitive bias in public goods games.学习导致有限理性和公共物品博弈中认知偏差的演变。
Sci Rep. 2019 Nov 8;9(1):16319. doi: 10.1038/s41598-019-52781-7.
8
Network coevolution drives segregation and enhances Pareto optimal equilibrium selection in coordination games.网络共同进化推动协调博弈中的隔离并增强帕累托最优均衡选择。
Sci Rep. 2023 Feb 17;13(1):2866. doi: 10.1038/s41598-023-30011-5.
9
Sequential interactions-in which one player plays first and another responds-promote cooperation in evolutionary-dynamical simulations of single-shot Prisoner's Dilemma and Snowdrift games.在单次囚徒困境和雪堆博弈的进化动力学模拟中,顺序交互(一个玩家先玩,另一个玩家响应)促进了合作。
J Theor Biol. 2018 Sep 7;452:69-80. doi: 10.1016/j.jtbi.2018.05.007. Epub 2018 May 21.
10
Self-control with spiking and non-spiking neural networks playing games.通过脉冲神经网络和非脉冲神经网络进行游戏时的自我控制。
J Physiol Paris. 2010 May-Sep;104(3-4):108-17. doi: 10.1016/j.jphysparis.2009.11.013. Epub 2009 Nov 26.

引用本文的文献

1
Collective cooperative intelligence.集体合作智慧。
Proc Natl Acad Sci U S A. 2025 Jun 24;122(25):e2319948121. doi: 10.1073/pnas.2319948121. Epub 2025 Jun 16.
2
The evolutionary consequences of learning under competition.学习竞争下的进化后果。
Proc Biol Sci. 2024 Aug;291(2028):20241141. doi: 10.1098/rspb.2024.1141. Epub 2024 Aug 7.
3
Adaptive control for circulating cooling water system using deep reinforcement learning.基于深度强化学习的循环冷却水系统自适应控制。

本文引用的文献

1
The evolution of distorted beliefs vs. mistaken choices under asymmetric error costs.非对称误差成本下扭曲信念与错误选择的演变
Evol Hum Sci. 2020 May 20;2:e27. doi: 10.1017/ehs.2020.25. eCollection 2020.
2
Learning leads to bounded rationality and the evolution of cognitive bias in public goods games.学习导致有限理性和公共物品博弈中认知偏差的演变。
Sci Rep. 2019 Nov 8;9(1):16319. doi: 10.1038/s41598-019-52781-7.
3
Like chimpanzees (Pan troglodytes), pigeons (Columba livia domestica) match and Nash equilibrate where humans (Homo sapiens) do not.
PLoS One. 2024 Jul 24;19(7):e0307767. doi: 10.1371/journal.pone.0307767. eCollection 2024.
4
Behavioural specialization and learning in social networks.社会网络中的行为专门化和学习。
Proc Biol Sci. 2022 Aug 10;289(1980):20220954. doi: 10.1098/rspb.2022.0954.
与黑猩猩(黑猩猩属)一样,家鸽(家鸽种)在人类(智人种)无法匹配和达成纳什均衡的地方却能做到。
J Comp Psychol. 2019 May;133(2):197-206. doi: 10.1037/com0000144. Epub 2018 Oct 29.
4
An intraspecific appraisal of the social intelligence hypothesis.同物种间社会智力假说的评估
Philos Trans R Soc Lond B Biol Sci. 2018 Sep 26;373(1756). doi: 10.1098/rstb.2017.0288.
5
Neural Mechanisms of Social Cognition in Primates.灵长类动物的社会认知神经机制。
Annu Rev Neurosci. 2018 Jul 8;41:99-118. doi: 10.1146/annurev-neuro-080317-061450. Epub 2018 Mar 21.
6
The infinitesimal model: Definition, derivation, and implications.无穷小模型:定义、推导及影响
Theor Popul Biol. 2017 Dec;118:50-73. doi: 10.1016/j.tpb.2017.06.001. Epub 2017 Jul 11.
7
Reputation can enhance or suppress cooperation through positive feedback.声誉可以通过正反馈增强或抑制合作。
Nat Commun. 2015 Jan 20;6:6134. doi: 10.1038/ncomms7134.
8
Coordination strategies of chimpanzees and human children in a Stag Hunt game.黑猩猩与人类儿童在猎鹿博弈中的协调策略
Proc Biol Sci. 2014 Dec 7;281(1796):20141973. doi: 10.1098/rspb.2014.1973.
9
Neural correlates of strategic reasoning during competitive games.竞技游戏中策略推理的神经关联
Science. 2014 Oct 17;346(6207):340-3. doi: 10.1126/science.1256254. Epub 2014 Sep 18.
10
Towards a richer evolutionary game theory.走向更丰富的进化博弈论。
J R Soc Interface. 2013 Aug 21;10(88):20130544. doi: 10.1098/rsif.2013.0544. Print 2013 Nov 6.