Newton P K, Ma Y
Department of Aerospace & Mechanical Engineering, Mathematics, and The Ellison Institute, University of Southern California, Los Angeles, California 90089-1191, USA.
Department of Physics & Astronomy, University of Southern California, Los Angeles, California 90089-1191, USA.
Phys Rev E. 2021 Jan;103(1-1):012304. doi: 10.1103/PhysRevE.103.012304.
The prisoner's dilemma (PD) game offers a simple paradigm of competition between two players who can either cooperate or defect. Since defection is a strict Nash equilibrium, it is an asymptotically stable state of the replicator dynamical system that uses the PD payoff matrix to define the fitness landscape of two interacting evolving populations. The dilemma arises from the fact that the average payoff of this asymptotically stable state is suboptimal. Coaxing the players to cooperate would result in a higher payoff for both. Here we develop an optimal control theory for the prisoner's dilemma evolutionary game in order to maximize cooperation (minimize the defector population) over a given cycle time T, subject to constraints. Our two time-dependent controllers are applied to the off-diagonal elements of the payoff matrix in a bang-bang sequence that dynamically changes the game being played by dynamically adjusting the payoffs, with optimal timing that depends on the initial population distributions. Over multiple cycles nT (n>1), the method is adaptive as it uses the defector population at the end of the nth cycle to calculate the optimal schedule over the n+1st cycle. The control method, based on Pontryagin's maximum principle, can be viewed as determining the optimal way to dynamically alter incentives and penalties in order to maximize the probability of cooperation in settings that track dynamic changes in the frequency of strategists, with potential applications in evolutionary biology, economics, theoretical ecology, social sciences, reinforcement learning, and other fields where the replicator system is used.
囚徒困境(PD)博弈提供了一个简单的范式,用于描述两个参与者之间的竞争,他们可以选择合作或背叛。由于背叛是严格的纳什均衡,它是复制者动力系统的渐近稳定状态,该系统使用PD收益矩阵来定义两个相互作用的进化群体的适应度景观。困境源于这样一个事实,即这个渐近稳定状态的平均收益是次优的。诱使参与者合作会使双方获得更高的收益。在这里,我们为囚徒困境进化博弈开发了一种最优控制理论,以便在给定的循环时间T内,在满足约束条件的情况下,最大化合作(最小化背叛者群体)。我们的两个时间相关控制器以一种砰砰序列应用于收益矩阵的非对角元素,通过动态调整收益来动态改变正在进行的博弈,其最优时机取决于初始群体分布。在多个循环nT(n>1)中,该方法具有适应性,因为它使用第n个循环结束时的背叛者群体来计算第n+1个循环的最优调度。基于庞特里亚金极大值原理的控制方法,可以被视为确定动态改变激励和惩罚的最优方式,以便在跟踪策略者频率动态变化的环境中最大化合作的概率,在进化生物学、经济学、理论生态学、社会科学、强化学习以及其他使用复制者系统的领域具有潜在应用。