双智能体随机博弈中的单边激励对齐

Unilateral incentive alignment in two-agent stochastic games.

作者信息

McAvoy Alex, Madhushani Sehwag Udari, Hilbe Christian, Chatterjee Krishnendu, Barfuss Wolfram, Su Qi, Leonard Naomi Ehrich, Plotkin Joshua B

机构信息

School of Data Science and Society, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599.

Department of Mathematics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599.

出版信息

Proc Natl Acad Sci U S A. 2025 Jun 24;122(25):e2319927121. doi: 10.1073/pnas.2319927121. Epub 2025 Jun 16.

DOI:10.1073/pnas.2319927121

PMID:40523172

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12207489/

Abstract

Multiagent learning is challenging when agents face mixed-motivation interactions, where conflicts of interest arise as agents independently try to optimize their respective outcomes. Recent advancements in evolutionary game theory have identified a class of "zero-determinant" strategies, which confer an agent with significant unilateral control over outcomes in repeated games. Building on these insights, we present a comprehensive generalization of zero-determinant strategies to stochastic games, encompassing dynamic environments. We propose an algorithm that allows an agent to discover strategies enforcing predetermined linear (or approximately linear) payoff relationships. Of particular interest is the relationship in which both payoffs are equal, which serves as a proxy for fairness in symmetric games. We demonstrate that an agent can discover strategies enforcing such relationships through experience alone, without coordinating with an opponent. In finding and using such a strategy, an agent ("enforcer") can incentivize optimal and equitable outcomes, circumventing potential exploitation. In particular, from the opponent's viewpoint, the enforcer transforms a mixed-motivation problem into a cooperative problem, paving the way for more collaboration and fairness in multiagent systems.

摘要

当智能体面临混合动机交互时，多智能体学习具有挑战性，在这种交互中，由于智能体独立尝试优化各自的结果而产生利益冲突。进化博弈论的最新进展已经确定了一类“零行列式”策略，这类策略赋予智能体在重复博弈中对结果的显著单方面控制权。基于这些见解，我们将零行列式策略全面推广到随机博弈，包括动态环境。我们提出了一种算法，该算法允许智能体发现强制执行预定线性（或近似线性）收益关系的策略。特别令人感兴趣的是双方收益相等的关系，它在对称博弈中作为公平性的代理。我们证明，智能体仅通过经验就能发现强制执行这种关系的策略，而无需与对手协调。在找到并使用这样的策略时，一个智能体（“执行者”）可以激励实现最优和公平的结果，避免潜在的剥削。特别是，从对手的角度来看，执行者将混合动机问题转化为合作问题，为多智能体系统中更多的协作和公平铺平了道路。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abe3/12207489/c445ef64b136/pnas.2319927121fig01.jpg

相似文献

Unilateral incentive alignment in two-agent stochastic games.双智能体随机博弈中的单边激励对齐

Proc Natl Acad Sci U S A. 2025 Jun 24;122(25):e2319927121. doi: 10.1073/pnas.2319927121. Epub 2025 Jun 16.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Assessing the comparative effects of interventions in COPD: a tutorial on network meta-analysis for clinicians.评估慢性阻塞性肺疾病干预措施的比较效果：面向临床医生的网状Meta分析教程

Respir Res. 2024 Dec 21;25(1):438. doi: 10.1186/s12931-024-03056-x.

Health professionals' experience of teamwork education in acute hospital settings: a systematic review of qualitative literature.医疗专业人员在急症医院环境中团队合作教育的经验：对定性文献的系统综述

JBI Database System Rev Implement Rep. 2016 Apr;14(4):96-137. doi: 10.11124/JBISRIR-2016-1843.

Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.性虐待和暴力的心理社会干预的幸存者、家庭和专业人员的经验：定性证据综合。

Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.

Management of urinary stones by experts in stone disease (ESD 2025).结石病专家对尿路结石的管理（2025年结石病专家共识）

Arch Ital Urol Androl. 2025 Jun 30;97(2):14085. doi: 10.4081/aiua.2025.14085.

Systemic treatments for metastatic cutaneous melanoma.转移性皮肤黑色素瘤的全身治疗

Cochrane Database Syst Rev. 2018 Feb 6;2(2):CD011123. doi: 10.1002/14651858.CD011123.pub2.

Incentives for preventing smoking in children and adolescents.预防儿童和青少年吸烟的激励措施。

Cochrane Database Syst Rev. 2017 Jun 6;6(6):CD008645. doi: 10.1002/14651858.CD008645.pub3.

Incentives for preventing smoking in children and adolescents.预防儿童和青少年吸烟的激励措施。

Cochrane Database Syst Rev. 2012 Oct 17;10:CD008645. doi: 10.1002/14651858.CD008645.pub2.

Stigma Management Strategies of Autistic Social Media Users.自闭症社交媒体用户的污名管理策略

Autism Adulthood. 2025 May 28;7(3):273-282. doi: 10.1089/aut.2023.0095. eCollection 2025 Jun.

引用本文的文献

Collective artificial intelligence and evolutionary dynamics.集体人工智能与进化动力学

Proc Natl Acad Sci U S A. 2025 Jun 24;122(25):e2505860122. doi: 10.1073/pnas.2505860122. Epub 2025 Jun 16.

本文引用的文献

Outlearning extortioners: unbending strategies can foster reciprocal fairness and cooperation.战胜敲诈者：坚定的策略能够促进互惠公平与合作。

PNAS Nexus. 2023 May 25;2(6):pgad176. doi: 10.1093/pnasnexus/pgad176. eCollection 2023 Jun.

Evolutionary instability of selfish learning in repeated games.重复博弈中自私学习的进化不稳定性

PNAS Nexus. 2022 Jul 27;1(4):pgac141. doi: 10.1093/pnasnexus/pgac141. eCollection 2022 Sep.

Payoff landscapes and the robustness of selfish optimization in iterated games.迭代博弈中的收益景观和自利优化的稳健性。

J Math Biol. 2022 May 12;84(6):55. doi: 10.1007/s00285-022-01758-8.

Memory-two zero-determinant strategies in repeated games.重复博弈中的记忆二零行列式策略。

R Soc Open Sci. 2021 May 26;8(5):202186. doi: 10.1098/rsos.202186.

Zero-determinant strategies under observation errors in repeated games.重复博弈中存在观测误差时的零行列式策略。

Phys Rev E. 2020 Sep;102(3-1):032115. doi: 10.1103/PhysRevE.102.032115.

Partners and rivals in direct reciprocity.直接互惠的伙伴和对手。

Nat Hum Behav. 2018 Jul;2(7):469-477. doi: 10.1038/s41562-018-0320-9. Epub 2018 Mar 19.

Zero-determinant strategies in finitely repeated games.有限重复博弈中的零行列式策略。

J Theor Biol. 2018 Feb 7;438:61-77. doi: 10.1016/j.jtbi.2017.11.002. Epub 2017 Nov 14.

Memory- strategies of direct reciprocity.直接互惠的记忆策略。

Proc Natl Acad Sci U S A. 2017 May 2;114(18):4715-4720. doi: 10.1073/pnas.1621239114. Epub 2017 Apr 18.

Evolutionary consequences of behavioral diversity.行为多样性的进化后果。

Proc Natl Acad Sci U S A. 2016 Nov 8;113(45):E7003-E7009. doi: 10.1073/pnas.1608990113. Epub 2016 Oct 24.

Autocratic strategies for alternating games.交替博弈的独裁策略。

Theor Popul Biol. 2017 Feb;113:13-22. doi: 10.1016/j.tpb.2016.09.004. Epub 2016 Sep 29.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

双智能体随机博弈中的单边激励对齐

Unilateral incentive alignment in two-agent stochastic games.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献