囚徒困境中随机学习自动机的进化策略

Evolutionary strategies of stochastic learning automata in the prisoner's dilemma.

作者信息

Billard E A

机构信息

Faculty of Computer Science and Engineering, University of Aizu, Fukushima, Japan.

出版信息

Biosystems. 1996;39(2):93-107. doi: 10.1016/0303-2647(96)01604-8.

DOI:10.1016/0303-2647(96)01604-8

PMID:8866046

Abstract

Stochastic learning automata (SLA) model stimulus-response species which receive feedback from the environment and adjust their mixed strategies in a Prisoner's Dilemma. A large heterogeneous population consists of SLA applying different strategies (i.e. different learning parameters) and other players applying deterministic strategies, Tit-For-Tat (TFT) or Always-Defect (ALLD). The predicted equilibria determine the payoffs within a generation for applying particular strategies and these equilibria are confirmed by simulation. The resultant population dynamics over many generations show that SLA with insensitive penalty responses strongly favor defection and dominate in subsequent generations over SLA with sensitive penalty responses. The SLA strategies are not evolutionarily stable as they can be invaded by TFT or ALLD. With the introduction of memory in the stimulus-response model, SLA learn to cooperate with TFT players.

摘要

随机学习自动机（SLA）对刺激 - 反应类型进行建模，这些类型从环境中接收反馈，并在囚徒困境中调整其混合策略。一个由大量不同个体组成的群体，其中包括采用不同策略（即不同学习参数）的SLA以及采用确定性策略的其他参与者，如针锋相对（TFT）或总是背叛（ALLD）。预测的均衡决定了应用特定策略时一代内的收益，并且这些均衡通过模拟得到了证实。多代产生的总体动态表明，具有不敏感惩罚反应的SLA强烈倾向于背叛，并且在后代中比具有敏感惩罚反应的SLA占主导地位。SLA策略在进化上并不稳定，因为它们可能会被TFT或ALLD入侵。随着在刺激 - 反应模型中引入记忆，SLA学会了与TFT参与者合作。