Lee Christopher, Harper Marc, Fryer Dashiell
Institute for Genomics and Proteomics, University of California, Los Angeles, CA, USA; Dept. of Chemistry & Biochemistry, University of California, Los Angeles, CA, USA; Dept. of Computer Science, University of California, Los Angeles, CA, USA; Molecular Biology Institute, University of California, Los Angeles, CA, USA.
Institute for Genomics and Proteomics, University of California, Los Angeles, CA, USA.
PLoS One. 2015 Mar 24;10(3):e0120625. doi: 10.1371/journal.pone.0120625. eCollection 2015.
We show that the history of play in a population game contains exploitable information that can be successfully used by sophisticated strategies to defeat memory-one opponents, including zero determinant strategies. The history allows a player to label opponents by their strategies, enabling a player to determine the population distribution and to act differentially based on the opponent's strategy in each pairwise interaction. For the Prisoner's Dilemma, these advantages lead to the natural formation of cooperative coalitions among similarly behaving players and eventually to unilateral defection against opposing player types. We show analytically and empirically that optimal play in population games depends strongly on the population distribution. For example, the optimal strategy for a minority player type against a resident TFT population is ALLC, while for a majority player type the optimal strategy versus TFT players is ALLD. Such behaviors are not accessible to memory-one strategies. Drawing inspiration from Sun Tzu's the Art of War, we implemented a non-memory-one strategy for population games based on techniques from machine learning and statistical inference that can exploit the history of play in this manner. Via simulation we find that this strategy is essentially uninvadable and can successfully invade (significantly more likely than a neutral mutant) essentially all known memory-one strategies for the Prisoner's Dilemma, including ALLC (always cooperate), ALLD (always defect), tit-for-tat (TFT), win-stay-lose-shift (WSLS), and zero determinant (ZD) strategies, including extortionate and generous strategies.
我们表明,群体博弈中的博弈历史包含可利用的信息,复杂策略可以成功利用这些信息来击败记忆一策略对手,包括零行列式策略。这段历史使玩家能够根据对手的策略对其进行标记,从而使玩家能够确定群体分布,并在每次两两互动中根据对手的策略采取不同行动。对于囚徒困境而言,这些优势导致行为相似的玩家之间自然形成合作联盟,并最终导致对对立玩家类型的单方面背叛。我们通过分析和实证表明,群体博弈中的最优玩法在很大程度上取决于群体分布。例如,少数玩家类型对抗常驻的以牙还牙(TFT)群体时的最优策略是始终合作(ALLC),而多数玩家类型对抗TFT玩家时的最优策略是始终背叛(ALLD)。记忆一策略无法实现此类行为。从孙子的《孙子兵法》中汲取灵感,我们基于机器学习和统计推断技术,为群体博弈实施了一种非记忆一策略,该策略能够以这种方式利用博弈历史。通过模拟我们发现,该策略基本上不可被入侵,并且能够成功入侵(比中性突变体的可能性大得多)囚徒困境中基本上所有已知的记忆一策略,包括始终合作(ALLC)、始终背叛(ALLD)、以牙还牙(TFT)、赢留输变(WSLS)以及零行列式(ZD)策略,包括敲诈勒索和慷慨策略。