结构化和非结构化环境中长形式强化学习的微观发生学解释。

A micro-genesis account of longer-form reinforcement learning in structured and unstructured environments.

作者信息

Dyson Benjamin James, Asad Ahad

机构信息

University of Alberta, Edmonton, AB, Canada.

University of Sussex, Falmer, UK.

出版信息

NPJ Sci Learn. 2021 Jun 23;6(1):19. doi: 10.1038/s41539-021-00098-4.

DOI:10.1038/s41539-021-00098-4

PMID:34162885

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8222288/

Abstract

We explored the possibility that in order for longer-form expressions of reinforcement learning (win-calmness, loss-restlessness) to manifest across tasks, they must first develop because of micro-transactions within tasks. We found no evidence of win-calmness or loss-restlessness when wins could not be maximised (unexploitable opponents), nor when the threat of win minimisation was presented (exploiting opponents), but evidence of win-calmness (but not loss-restlessness) when wins could be maximised (exploitable opponents).

摘要

我们探讨了一种可能性，即强化学习的较长形式表达（赢时平静、输时不安）要在各种任务中体现出来，首先必须因任务中的微观交易而发展起来。当无法实现获胜最大化时（不可利用的对手），以及当出现获胜最小化的威胁时（利用对手），我们没有发现赢时平静或输时不安的证据，但当可以实现获胜最大化时（可利用的对手），我们发现了赢时平静（但没有输时不安）的证据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb1e/8222288/b0bbf76d8986/41539_2021_98_Fig1_HTML.jpg

相似文献

A micro-genesis account of longer-form reinforcement learning in structured and unstructured environments.

NPJ Sci Learn. 2021 Jun 23;6(1):19. doi: 10.1038/s41539-021-00098-4.

Breaking the bonds of reinforcement: Effects of trial outcome, rule consistency and rule complexity against exploitable and unexploitable opponents.

PLoS One. 2022 Feb 2;17(2):e0262249. doi: 10.1371/journal.pone.0262249. eCollection 2022.

Probing relationships between reinforcement learning and simple behavioral strategies to understand probabilistic reward learning.

J Neurosci Methods. 2020 Jul 15;341:108777. doi: 10.1016/j.jneumeth.2020.108777. Epub 2020 May 15.

Do losses disguised as wins create a "sweet spot" for win overestimates in multiline slots play?

Addict Behav. 2021 Jan;112:106598. doi: 10.1016/j.addbeh.2020.106598. Epub 2020 Aug 3.

Behavioural and neural limits in competitive decision making: The roles of outcome, opponency and observation.

Biol Psychol. 2020 Jan;149:107778. doi: 10.1016/j.biopsycho.2019.107778. Epub 2019 Oct 5.

Improving working equine welfare in 'hard-win' situations, where gains are difficult, expensive or marginal.

PLoS One. 2018 Feb 6;13(2):e0191950. doi: 10.1371/journal.pone.0191950. eCollection 2018.

Failure generates impulsivity only when outcomes cannot be controlled.

J Exp Psychol Hum Percept Perform. 2018 Oct;44(10):1483-1487. doi: 10.1037/xhp0000557. Epub 2018 Jul 19.

Probabilistic reinforcement learning abnormalities and their correlates in adolescent bipolar disorders.

J Abnorm Psychol. 2018 Nov;127(8):807-817. doi: 10.1037/abn0000388.

Near wins prolong gambling on a video lottery terminal.

J Gambl Stud. 2003 Winter;19(4):433-8. doi: 10.1023/a:1026384011003.

Exp Aging Res. 2018 Mar-Apr;44(2):135-147. doi: 10.1080/0361073X.2017.1422474. Epub 2018 Jan 5.

引用本文的文献

Assessing behavioural profiles following neutral, positive and negative feedback.

PLoS One. 2022 Jul 5;17(7):e0270475. doi: 10.1371/journal.pone.0270475. eCollection 2022.

本文引用的文献

Variability in competitive decision-making speed and quality against exploiting and exploitative opponents.

Sci Rep. 2021 Feb 3;11(1):2859. doi: 10.1038/s41598-021-82269-2.

Continuous decisions.

Philos Trans R Soc Lond B Biol Sci. 2021 Mar;376(1819):20190664. doi: 10.1098/rstb.2019.0664. Epub 2021 Jan 11.

Behavioural and neural interactions between objective and subjective performance in a Matching Pennies game.

Int J Psychophysiol. 2020 Jan;147:128-136. doi: 10.1016/j.ijpsycho.2019.11.002. Epub 2019 Nov 13.

Behavioural and neural limits in competitive decision making: The roles of outcome, opponency and observation.

Biol Psychol. 2020 Jan;149:107778. doi: 10.1016/j.biopsycho.2019.107778. Epub 2019 Oct 5.

Lesions of ventrolateral striatum eliminate lose-shift but not win-stay behaviour in rats.

Neurobiol Learn Mem. 2018 Nov;155:446-451. doi: 10.1016/j.nlm.2018.08.022. Epub 2018 Sep 1.

Failure generates impulsivity only when outcomes cannot be controlled.

J Exp Psychol Hum Percept Perform. 2018 Oct;44(10):1483-1487. doi: 10.1037/xhp0000557. Epub 2018 Jul 19.

Behavioural and neural modulation of win-stay but not lose-shift strategies as a function of outcome value in Rock, Paper, Scissors.

Sci Rep. 2016 Sep 23;6:33809. doi: 10.1038/srep33809.

How the threat of losses makes people explore more than the promise of gains.

Psychon Bull Rev. 2017 Jun;24(3):708-720. doi: 10.3758/s13423-016-1158-7.

Reward and punishment act as distinct factors in guiding behavior.

Cognition. 2015 Jun;139:154-67. doi: 10.1016/j.cognition.2015.03.005. Epub 2015 Mar 28.

Loss restlessness and gain calmness: durable effects of losses and gains on choice switching.

Psychon Bull Rev. 2015 Aug;22(4):1096-103. doi: 10.3758/s13423-014-0749-4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

结构化和非结构化环境中长形式强化学习的微观发生学解释。

A micro-genesis account of longer-form reinforcement learning in structured and unstructured environments.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献