Lorberbaum J
Department of Psychiatry, Stanford University, CA 94305.
J Theor Biol. 1994 May 21;168(2):117-30. doi: 10.1006/jtbi.1994.1092.
Following the influential work of Axelrod, the repeated Prisoner's Dilemma game has become the theoretical gold standard for understanding the evolution of co-operative behavior among unrelated individuals. Using the game, several authors have found that a reciprocal strategy known as Tit for Tat (TFT) has done quite well in a wide range of environments. TFT strategists start out co-operating and then do what the other player did on the previous move. Despite the success of TFT and similar strategies in experimental studies of the game, Boyd & Lorberbaum (1987, Nature, Lond. 327, 58) have shown that no pure strategy, including TFT, is evolutionarily stable in the sense that each can be invaded by the joint effect of two invading strategies when long-term interaction occurs in the repeated game and future moves are discounted. Farrell & Ware (1989, Theor. Popul. Biol. 36, 161) have since extended these results to include finite mixes of pure strategies as well. Here, it is proven that no strategy is evolutionarily stable when long-term relationships are maintained in the repeated Prisoner's Dilemma and future moves are discounted. Namely, it is shown each completely probabilistic strategy (i.e. one that both co-operates and defects with positive probability after every sequence of behavior) may be invaded by a single deviant strategy. This completes the proof started by Boyd and Lorberbaum and extended by Farrell and Ware. This paper goes on to prove that no reactive strategy with a memory restricted to the opponent's preceding move is evolutionarily stable when there is no discounting of future moves. This is true despite the success of a more forgiving variant of TFT called GTFT in a recent tournament among reactive strategies conducted by Nowak & Sigmund (1992, Nature 355, 250) where future moves were not discounted. GTFT, for example, may be invaded by a pair of reactive mutants. Since no strategy is evolutionarily stable when future moves are discounted in the repeated game, the restriction of strategy types to those actually maintained by mutation and phenotypic and environmental variability in natural populations may be the key to understanding the evolution of co-operation. However, the result presented here that the somewhat realistic reactive strategies are also not evolutionarily stable at least in the non-discounted game suggests something else may be going on. For one, the proof that no reactive strategy is evolutionarily stable ironically shows the robustness of TFT-like strategies.(ABSTRACT TRUNCATED AT 400 WORDS)
继阿克塞尔罗德具有影响力的研究之后,重复囚徒困境博弈已成为理解无亲缘关系个体间合作行为进化的理论黄金标准。运用该博弈,多位作者发现一种被称为“以牙还牙”(TFT)的互惠策略在广泛的环境中表现出色。TFT策略者一开始选择合作,然后根据对方上一步的行为做出相应举动。尽管TFT及类似策略在该博弈的实验研究中取得了成功,但博伊德和洛伯鲍姆(1987年,《自然》,伦敦,327卷,58页)表明,在重复博弈中发生长期互动且未来举动被贴现的情况下,包括TFT在内的任何纯策略在进化上都不稳定,即每种纯策略都可能被两种入侵策略的联合效应所入侵。此后,法雷尔和韦尔(1989年,《理论种群生物学》,36卷,161页)将这些结果扩展到也包括纯策略的有限混合情况。在此证明,在重复囚徒困境中维持长期关系且未来举动被贴现时,没有任何策略在进化上是稳定的。也就是说,证明了每种完全概率性策略(即每次行为序列后以正概率合作和背叛的策略)都可能被单一异常策略所入侵。这完成了由博伊德和洛伯鲍姆开启并由法雷尔和韦尔扩展的证明。本文接着证明,当未来举动不被贴现时,记忆仅限于对手上一步行为的任何反应性策略在进化上都不稳定。尽管在诺瓦克和西格蒙德(1992年,《自然》355卷,250页)近期进行且未来举动不被贴现的反应性策略锦标赛中,一种更宽容的TFT变体GTFT取得了成功,但情况依然如此。例如,GTFT可能会被一对反应性突变体所入侵。由于在重复博弈中未来举动被贴现时没有任何策略在进化上是稳定的,将策略类型限制为自然种群中通过突变以及表型和环境变异性实际维持的那些策略,可能是理解合作进化的关键。然而,此处给出的结果表明,至少在无贴现博弈中,有些现实的反应性策略在进化上也不稳定,这暗示可能还有其他情况在发生。一方面,证明没有反应性策略在进化上稳定,具有讽刺意味的是显示了类似TFT策略的稳健性。