Dyson Benjamin James, Asad Ahad
University of Alberta, Edmonton, AB, Canada.
University of Sussex, Falmer, UK.
NPJ Sci Learn. 2021 Jun 23;6(1):19. doi: 10.1038/s41539-021-00098-4.
We explored the possibility that in order for longer-form expressions of reinforcement learning (win-calmness, loss-restlessness) to manifest across tasks, they must first develop because of micro-transactions within tasks. We found no evidence of win-calmness or loss-restlessness when wins could not be maximised (unexploitable opponents), nor when the threat of win minimisation was presented (exploiting opponents), but evidence of win-calmness (but not loss-restlessness) when wins could be maximised (exploitable opponents).
我们探讨了一种可能性,即强化学习的较长形式表达(赢时平静、输时不安)要在各种任务中体现出来,首先必须因任务中的微观交易而发展起来。当无法实现获胜最大化时(不可利用的对手),以及当出现获胜最小化的威胁时(利用对手),我们没有发现赢时平静或输时不安的证据,但当可以实现获胜最大化时(可利用的对手),我们发现了赢时平静(但没有输时不安)的证据。