• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

连续时间中尺度不变未来的估计。

Estimating Scale-Invariant Future in Continuous Time.

作者信息

Tiganj Zoran, Gershman Samuel J, Sederberg Per B, Howard Marc W

机构信息

Center for Memory and Brain, Department of Psychological and Brain Sciences, Boston, MA 02215, U.S.A.

Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA 02138, U.S.A.

出版信息

Neural Comput. 2019 Apr;31(4):681-709. doi: 10.1162/neco_a_01171. Epub 2019 Feb 14.

DOI:10.1162/neco_a_01171
PMID:30764739
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6959535/
Abstract

Natural learners must compute an estimate of future outcomes that follow from a stimulus in continuous time. Widely used reinforcement learning algorithms discretize continuous time and estimate either transition functions from one step to the next (model-based algorithms) or a scalar value of exponentially discounted future reward using the Bellman equation (model-free algorithms). An important drawback of model-based algorithms is that computational cost grows linearly with the amount of time to be simulated. An important drawback of model-free algorithms is the need to select a timescale required for exponential discounting. We present a computational mechanism, developed based on work in psychology and neuroscience, for computing a scale-invariant timeline of future outcomes. This mechanism efficiently computes an estimate of inputs as a function of future time on a logarithmically compressed scale and can be used to generate a scale-invariant power-law-discounted estimate of expected future reward. The representation of future time retains information about what will happen when. The entire timeline can be constructed in a single parallel operation that generates concrete behavioral and neural predictions. This computational mechanism could be incorporated into future reinforcement learning algorithms.

摘要

自然学习者必须在连续时间内计算由刺激所导致的未来结果的估计值。广泛使用的强化学习算法将连续时间离散化,并估计从一个步骤到下一个步骤的转移函数(基于模型的算法),或者使用贝尔曼方程估计指数贴现未来奖励的标量值(无模型算法)。基于模型的算法的一个重要缺点是计算成本会随着要模拟的时间量呈线性增长。无模型算法的一个重要缺点是需要选择指数贴现所需的时间尺度。我们基于心理学和神经科学的研究成果提出了一种计算机制,用于计算未来结果的尺度不变时间线。这种机制能够在对数压缩尺度上高效地计算作为未来时间函数的输入估计值,并且可用于生成预期未来奖励的尺度不变幂律贴现估计值。未来时间的表示保留了关于何时会发生何事的信息。整个时间线可以通过一次并行操作构建出来,从而产生具体的行为和神经预测。这种计算机制可以被纳入未来的强化学习算法中。

相似文献

1
Estimating Scale-Invariant Future in Continuous Time.连续时间中尺度不变未来的估计。
Neural Comput. 2019 Apr;31(4):681-709. doi: 10.1162/neco_a_01171. Epub 2019 Feb 14.
2
Hyperbolically discounted temporal difference learning.超贴现时间差分学习。
Neural Comput. 2010 Jun;22(6):1511-27. doi: 10.1162/neco.2010.08-09-1080.
3
Neural basis of reinforcement learning and decision making.强化学习和决策的神经基础。
Annu Rev Neurosci. 2012;35:287-308. doi: 10.1146/annurev-neuro-062111-150512. Epub 2012 Mar 29.
4
Multiple memory systems as substrates for multiple decision systems.多种记忆系统作为多种决策系统的基础。
Neurobiol Learn Mem. 2015 Jan;117:4-13. doi: 10.1016/j.nlm.2014.04.014. Epub 2014 May 15.
5
Temporal-difference reinforcement learning with distributed representations.基于分布式表示的时间差分强化学习。
PLoS One. 2009 Oct 20;4(10):e7362. doi: 10.1371/journal.pone.0007362.
6
Beyond dichotomies in reinforcement learning.超越强化学习中的二分法。
Nat Rev Neurosci. 2020 Oct;21(10):576-586. doi: 10.1038/s41583-020-0355-6. Epub 2020 Sep 1.
7
One-shot learning and behavioral eligibility traces in sequential decision making.序列决策中的单次学习和行为资格痕迹。
Elife. 2019 Nov 11;8:e47463. doi: 10.7554/eLife.47463.
8
The ubiquity of model-based reinforcement learning.基于模型的强化学习无处不在。
Curr Opin Neurobiol. 2012 Dec;22(6):1075-81. doi: 10.1016/j.conb.2012.08.003. Epub 2012 Sep 6.
9
Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates.目标导向决策作为概率推理:计算框架和潜在的神经关联。
Psychol Rev. 2012 Jan;119(1):120-54. doi: 10.1037/a0026435.
10
Reinforcement learning in continuous time and space.连续时间与空间中的强化学习。
Neural Comput. 2000 Jan;12(1):219-45. doi: 10.1162/089976600300015961.

引用本文的文献

1
A multidimensional distributional map of future reward in dopamine neurons.多巴胺神经元中未来奖励的多维分布图。
Nature. 2025 Jun;642(8068):691-699. doi: 10.1038/s41586-025-09089-6. Epub 2025 Jun 4.
2
Event boundaries drive norepinephrine release and distinctive neural representations of space in the rodent hippocampus.事件边界驱动去甲肾上腺素释放以及啮齿动物海马体中对空间的独特神经表征。
bioRxiv. 2024 Aug 31:2024.07.30.605900. doi: 10.1101/2024.07.30.605900.
3
Learning temporal relationships between symbols with Laplace Neural Manifolds.利用拉普拉斯神经流形学习符号之间的时间关系。
ArXiv. 2024 Sep 22:arXiv:2302.10163v4.
4
Scanning a compressed ordered representation of the future.扫描未来的压缩有序表示。
J Exp Psychol Gen. 2022 Dec;151(12):3082-3096. doi: 10.1037/xge0001243. Epub 2022 Aug 1.
5
Time Is of the Essence: Neural Codes, Synchronies, Oscillations, Architectures.时间至关重要:神经编码、同步性、振荡、结构。
Front Comput Neurosci. 2022 Jun 15;16:898829. doi: 10.3389/fncom.2022.898829. eCollection 2022.
6
Predicting the Future With a Scale-Invariant Temporal Memory for the Past.用具有过去时间不变尺度记忆的方法来预测未来。
Neural Comput. 2022 Feb 17;34(3):642-685. doi: 10.1162/neco_a_01475.
7
Neural circuits and symbolic processing.神经回路和符号处理。
Neurobiol Learn Mem. 2021 Dec;186:107552. doi: 10.1016/j.nlm.2021.107552. Epub 2021 Nov 8.
8
A temporal record of the past with a spectrum of time constants in the monkey entorhinal cortex.猴子内嗅皮层中具有时间常数谱的过去的时间记录。
Proc Natl Acad Sci U S A. 2020 Aug 18;117(33):20274-20283. doi: 10.1073/pnas.1917197117. Epub 2020 Aug 3.
9
Time-conjunctive representations of future events.未来事件的时间连接表示。
Mem Cognit. 2020 May;48(4):672-682. doi: 10.3758/s13421-019-00999-1.
10
Bayesian nonparametric models characterize instantaneous strategies in a competitive dynamic game.贝叶斯非参数模型刻画了竞争动态博弈中的即时策略。
Nat Commun. 2019 Apr 18;10(1):1808. doi: 10.1038/s41467-019-09789-4.

本文引用的文献

1
The successor representation in human reinforcement learning.人类强化学习中的后继表示
Nat Hum Behav. 2017 Sep;1(9):680-692. doi: 10.1038/s41562-017-0180-8. Epub 2017 Aug 28.
2
The Same Hippocampal CA1 Population Simultaneously Codes Temporal Information over Multiple Timescales.同一海马 CA1 群体同时在多个时间尺度上编码时间信息。
Curr Biol. 2018 May 21;28(10):1499-1508.e4. doi: 10.1016/j.cub.2018.03.051. Epub 2018 Apr 26.
3
Compressed Timeline of Recent Experience in Monkey Lateral Prefrontal Cortex.近期猕猴外侧前额叶皮质经验的压缩时间表。
J Cogn Neurosci. 2018 Jul;30(7):935-950. doi: 10.1162/jocn_a_01273. Epub 2018 Apr 26.
4
Sequential Firing Codes for Time in Rodent Medial Prefrontal Cortex.啮齿动物内侧前额叶皮层时间的顺序点火码。
Cereb Cortex. 2017 Dec 1;27(12):5663-5671. doi: 10.1093/cercor/bhw336.
5
Neural scaling laws for an uncertain world.神经缩放法则应对不确定的世界。
Psychol Rev. 2018 Jan;125(1):47-58. doi: 10.1037/rev0000081. Epub 2017 Oct 16.
6
The hippocampus as a predictive map.海马体作为一个预测图。
Nat Neurosci. 2017 Nov;20(11):1643-1653. doi: 10.1038/nn.4650. Epub 2017 Oct 2.
7
Predicting the Past, Remembering the Future.预测过去,铭记未来。
Curr Opin Behav Sci. 2017 Oct;17:7-13. doi: 10.1016/j.cobeha.2017.05.025. Epub 2017 Jun 9.
8
Temporal and Rate Coding for Discrete Event Sequences in the Hippocampus.海马体中离散事件序列的时间和速率编码。
Neuron. 2017 Jun 21;94(6):1248-1262.e4. doi: 10.1016/j.neuron.2017.05.024. Epub 2017 Jun 9.
9
Thalamic projections sustain prefrontal activity during working memory maintenance.丘脑投射在工作记忆维持期间维持前额叶活动。
Nat Neurosci. 2017 Jul;20(7):987-996. doi: 10.1038/nn.4568. Epub 2017 May 3.
10
Dissociated sequential activity and stimulus encoding in the dorsomedial striatum during spatial working memory.空间工作记忆期间背内侧纹状体中的分离序列活动与刺激编码
Elife. 2016 Sep 16;5:e19507. doi: 10.7554/eLife.19507.