Suppr超能文献

双智能体随机博弈中的单边激励对齐

Unilateral incentive alignment in two-agent stochastic games.

作者信息

McAvoy Alex, Madhushani Sehwag Udari, Hilbe Christian, Chatterjee Krishnendu, Barfuss Wolfram, Su Qi, Leonard Naomi Ehrich, Plotkin Joshua B

机构信息

School of Data Science and Society, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599.

Department of Mathematics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599.

出版信息

Proc Natl Acad Sci U S A. 2025 Jun 24;122(25):e2319927121. doi: 10.1073/pnas.2319927121. Epub 2025 Jun 16.

Abstract

Multiagent learning is challenging when agents face mixed-motivation interactions, where conflicts of interest arise as agents independently try to optimize their respective outcomes. Recent advancements in evolutionary game theory have identified a class of "zero-determinant" strategies, which confer an agent with significant unilateral control over outcomes in repeated games. Building on these insights, we present a comprehensive generalization of zero-determinant strategies to stochastic games, encompassing dynamic environments. We propose an algorithm that allows an agent to discover strategies enforcing predetermined linear (or approximately linear) payoff relationships. Of particular interest is the relationship in which both payoffs are equal, which serves as a proxy for fairness in symmetric games. We demonstrate that an agent can discover strategies enforcing such relationships through experience alone, without coordinating with an opponent. In finding and using such a strategy, an agent ("enforcer") can incentivize optimal and equitable outcomes, circumventing potential exploitation. In particular, from the opponent's viewpoint, the enforcer transforms a mixed-motivation problem into a cooperative problem, paving the way for more collaboration and fairness in multiagent systems.

摘要

当智能体面临混合动机交互时,多智能体学习具有挑战性,在这种交互中,由于智能体独立尝试优化各自的结果而产生利益冲突。进化博弈论的最新进展已经确定了一类“零行列式”策略,这类策略赋予智能体在重复博弈中对结果的显著单方面控制权。基于这些见解,我们将零行列式策略全面推广到随机博弈,包括动态环境。我们提出了一种算法,该算法允许智能体发现强制执行预定线性(或近似线性)收益关系的策略。特别令人感兴趣的是双方收益相等的关系,它在对称博弈中作为公平性的代理。我们证明,智能体仅通过经验就能发现强制执行这种关系的策略,而无需与对手协调。在找到并使用这样的策略时,一个智能体(“执行者”)可以激励实现最优和公平的结果,避免潜在的剥削。特别是,从对手的角度来看,执行者将混合动机问题转化为合作问题,为多智能体系统中更多的协作和公平铺平了道路。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abe3/12207489/c445ef64b136/pnas.2319927121fig01.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验