McAvoy Alex, Madhushani Sehwag Udari, Hilbe Christian, Chatterjee Krishnendu, Barfuss Wolfram, Su Qi, Leonard Naomi Ehrich, Plotkin Joshua B
School of Data Science and Society, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599.
Department of Mathematics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599.
Proc Natl Acad Sci U S A. 2025 Jun 24;122(25):e2319927121. doi: 10.1073/pnas.2319927121. Epub 2025 Jun 16.
Multiagent learning is challenging when agents face mixed-motivation interactions, where conflicts of interest arise as agents independently try to optimize their respective outcomes. Recent advancements in evolutionary game theory have identified a class of "zero-determinant" strategies, which confer an agent with significant unilateral control over outcomes in repeated games. Building on these insights, we present a comprehensive generalization of zero-determinant strategies to stochastic games, encompassing dynamic environments. We propose an algorithm that allows an agent to discover strategies enforcing predetermined linear (or approximately linear) payoff relationships. Of particular interest is the relationship in which both payoffs are equal, which serves as a proxy for fairness in symmetric games. We demonstrate that an agent can discover strategies enforcing such relationships through experience alone, without coordinating with an opponent. In finding and using such a strategy, an agent ("enforcer") can incentivize optimal and equitable outcomes, circumventing potential exploitation. In particular, from the opponent's viewpoint, the enforcer transforms a mixed-motivation problem into a cooperative problem, paving the way for more collaboration and fairness in multiagent systems.
当智能体面临混合动机交互时,多智能体学习具有挑战性,在这种交互中,由于智能体独立尝试优化各自的结果而产生利益冲突。进化博弈论的最新进展已经确定了一类“零行列式”策略,这类策略赋予智能体在重复博弈中对结果的显著单方面控制权。基于这些见解,我们将零行列式策略全面推广到随机博弈,包括动态环境。我们提出了一种算法,该算法允许智能体发现强制执行预定线性(或近似线性)收益关系的策略。特别令人感兴趣的是双方收益相等的关系,它在对称博弈中作为公平性的代理。我们证明,智能体仅通过经验就能发现强制执行这种关系的策略,而无需与对手协调。在找到并使用这样的策略时,一个智能体(“执行者”)可以激励实现最优和公平的结果,避免潜在的剥削。特别是,从对手的角度来看,执行者将混合动机问题转化为合作问题,为多智能体系统中更多的协作和公平铺平了道路。