Billard E A
Faculty of Computer Science and Engineering, University of Aizu, Fukushima, Japan.
Biosystems. 1996;37(3):211-27. doi: 10.1016/0303-2647(95)01560-4.
Players in a Prisoner's Dilemma are modeled as learning automata that receive feedback from the environment and coadaptively adjust their strategies. Theory and simulations show the coevolutionary dynamics of the reward-inaction and reward-penalty schemes. The players are assumed to be physically distributed or, at least, in an environment where the effects of decisions are lagged. These systems include biological and social systems with constraints on instantaneous information or where environmental responses do not necessarily reflect the true state of the system. Linear stability analysis determines the conditions for persistent oscillations in the players' mixed strategies. Using a parameterized stochastic version of the dilemma, the results indicate that if the environment modifies the payoffs, and thus 'releases' the prisoners from their dilemma, the prisoners become prone to instabilities in their strategies given sufficient delays. Again, the prisoners fail to coordinate their actions.
囚徒困境中的参与者被建模为学习自动机,它们从环境中接收反馈并共同适应性地调整策略。理论和模拟展示了奖励无为和奖励惩罚方案的共同进化动态。假设参与者在物理上是分散的,或者至少处于决策效果存在滞后的环境中。这些系统包括对即时信息有限制或者环境响应不一定反映系统真实状态的生物和社会系统。线性稳定性分析确定了参与者混合策略中持续振荡的条件。使用该困境的参数化随机版本,结果表明,如果环境改变收益,从而将“囚徒”从困境中“释放”出来,那么在有足够延迟的情况下,囚徒的策略容易出现不稳定。同样,囚徒无法协调他们的行动。