Suppr超能文献

条件作用中的偶然性、连续性和因果关系:将信息论和韦伯定律应用于信用分配问题。

Contingency, contiguity, and causality in conditioning: Applying information theory and Weber's Law to the assignment of credit problem.

机构信息

Department of Psychology.

出版信息

Psychol Rev. 2019 Oct;126(5):761-773. doi: 10.1037/rev0000163. Epub 2019 Aug 29.

Abstract

Contingency is a critical concept for theories of associative learning and the assignment of credit problem in reinforcement learning. Measuring and manipulating it has, however, been problematic. The information-theoretic definition of contingency-normalized mutual information-makes it a readily computed property of the relation between reinforcing events, the stimuli that predict them and the responses that produce them. When necessary, the dynamic range of the required temporal representation divided by the Weber fraction gives a psychologically realistic plug-in estimates of the entropies. There is no measurable prospective contingency between a peck and reinforcement when pigeons peck on a variable interval schedule of reinforcement. There is, however, a perfect retrospective contingency between reinforcement and the immediately preceding peck. Degrading the retrospective contingency by gratis reinforcement reveals a critical value (.25), below which performance declines rapidly. Contingency is time scale invariant, whereas the perception of proximate causality depends-we assume-on there being a short, fixed psychologically negligible critical interval between cause and effect. Increasing the interval between a response and reinforcement that it triggers degrades the retrograde contingency, leading to a decline in performance that restores it to at or above its critical value. Thus, there is no critical interval in the retrospective effect of reinforcement. We conclude with a short review of the broad explanatory scope of information-theoretic contingencies when regarded as causal variables in conditioning. We suggest that the computation of contingencies may supplant the computation of the sum of all future rewards in models of reinforcement learning. (PsycINFO Database Record (c) 2019 APA, all rights reserved).

摘要

contingency 是联想学习理论和强化学习中的归因问题的一个关键概念。然而,对其进行衡量和操纵一直存在问题。信息论中 contingency 的定义——归一化互信息——使它成为强化事件之间、预测强化事件的刺激和产生强化事件的反应之间关系的一个易于计算的属性。当需要时,所需时间表示的动态范围除以韦伯分数,给出了一个心理上现实的熵插入估计值。当鸽子在可变间隔强化程序上啄食时,啄食和强化之间没有可测量的前瞻性关联。然而,强化和紧接着的啄食之间存在完美的回溯关联。通过免费强化来降低回溯关联,可以揭示一个关键值(.25),低于该值,表现会迅速下降。关联是时间尺度不变的,而对因果关系的感知则取决于——我们假设——在因果之间存在一个短的、固定的、心理上可忽略的关键间隔。增加引发强化的反应和强化之间的间隔会降低逆行关联,导致表现下降,直到恢复到或高于其关键值。因此,在强化的回溯效应中没有关键间隔。最后,我们简要回顾了信息论关联作为条件作用中的因果变量时的广泛解释范围。我们认为,在强化学习模型中,关联的计算可能会取代对所有未来奖励的总和的计算。(PsycINFO 数据库记录(c)2019 APA,保留所有权利)。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验