• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

探索-利用决策的动态变化揭示了随机探索的一种信噪比机制。

The dynamics of explore-exploit decisions reveal a signal-to-noise mechanism for random exploration.

机构信息

Department of Mathematics, Khalifa University of Science and Technology, Abu Dhabi, UAE.

Khalifa University Centre for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, UAE.

出版信息

Sci Rep. 2021 Feb 4;11(1):3077. doi: 10.1038/s41598-021-82530-8.

DOI:10.1038/s41598-021-82530-8
PMID:33542333
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7862437/
Abstract

Growing evidence suggests that behavioral variability plays a critical role in how humans manage the tradeoff between exploration and exploitation. In these decisions a little variability can help us to overcome the desire to exploit known rewards by encouraging us to randomly explore something else. Here we investigate how such 'random exploration' could be controlled using a drift-diffusion model of the explore-exploit choice. In this model, variability is controlled by either the signal-to-noise ratio with which reward is encoded (the 'drift rate'), or the amount of information required before a decision is made (the 'threshold'). By fitting this model to behavior, we find that while, statistically, both drift and threshold change when people randomly explore, numerically, the change in drift rate has by far the largest effect. This suggests that random exploration is primarily driven by changes in the signal-to-noise ratio with which reward information is represented in the brain.

摘要

越来越多的证据表明,行为可变性在人类如何在探索和利用之间的权衡中起着关键作用。在这些决策中,一点点的可变性可以帮助我们克服利用已知奖励的欲望,鼓励我们随机探索其他事物。在这里,我们研究了如何使用探索-利用选择的漂移-扩散模型来控制这种“随机探索”。在这个模型中,可变性由奖励编码的信噪比(“漂移率”)或做出决策之前所需的信息量(“阈值”)控制。通过将这个模型拟合到行为中,我们发现虽然从统计学上讲,当人们随机探索时,漂移和阈值都会发生变化,但在数值上,漂移率的变化影响最大。这表明随机探索主要是由大脑中奖励信息的表示信号与噪声比的变化驱动的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04a0/7862437/c7d6c9df5bb7/41598_2021_82530_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04a0/7862437/88486d78cbe0/41598_2021_82530_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04a0/7862437/9333067e0dc7/41598_2021_82530_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04a0/7862437/ecdf067cbee0/41598_2021_82530_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04a0/7862437/d108222be6a5/41598_2021_82530_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04a0/7862437/fe2b576b276b/41598_2021_82530_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04a0/7862437/09c1e955d882/41598_2021_82530_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04a0/7862437/166a35988a0a/41598_2021_82530_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04a0/7862437/c7d6c9df5bb7/41598_2021_82530_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04a0/7862437/88486d78cbe0/41598_2021_82530_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04a0/7862437/9333067e0dc7/41598_2021_82530_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04a0/7862437/ecdf067cbee0/41598_2021_82530_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04a0/7862437/d108222be6a5/41598_2021_82530_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04a0/7862437/fe2b576b276b/41598_2021_82530_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04a0/7862437/09c1e955d882/41598_2021_82530_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04a0/7862437/166a35988a0a/41598_2021_82530_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04a0/7862437/c7d6c9df5bb7/41598_2021_82530_Fig8_HTML.jpg

相似文献

1
The dynamics of explore-exploit decisions reveal a signal-to-noise mechanism for random exploration.探索-利用决策的动态变化揭示了随机探索的一种信噪比机制。
Sci Rep. 2021 Feb 4;11(1):3077. doi: 10.1038/s41598-021-82530-8.
2
Humans use directed and random exploration to solve the explore-exploit dilemma.人类利用有向探索和随机探索来解决探索与利用的两难困境。
J Exp Psychol Gen. 2014 Dec;143(6):2074-81. doi: 10.1037/a0038199. Epub 2014 Oct 27.
3
A causal role for right frontopolar cortex in directed, but not random, exploration.右侧额极在定向而非随机探索中起因果作用。
Elife. 2017 Sep 15;6:e27430. doi: 10.7554/eLife.27430.
4
Transcranial Stimulation over Frontopolar Cortex Elucidates the Choice Attributes and Neural Mechanisms Used to Resolve Exploration-Exploitation Trade-Offs.经颅刺激额极皮层揭示了用于解决探索-利用权衡的选择属性和神经机制。
J Neurosci. 2015 Oct 28;35(43):14544-56. doi: 10.1523/JNEUROSCI.2322-15.2015.
5
Sex differences in learning from exploration.从探索中学习的性别差异。
Elife. 2021 Nov 19;10:e69748. doi: 10.7554/eLife.69748.
6
Subcortical Substrates of Explore-Exploit Decisions in Primates.灵长类动物探索-利用决策的皮质下基质。
Neuron. 2019 Aug 7;103(3):533-545.e5. doi: 10.1016/j.neuron.2019.05.017. Epub 2019 Jun 10.
7
Exploration versus exploitation decisions in the human brain: A systematic review of functional neuroimaging and neuropsychological studies.人类大脑中的探索与开发决策:功能神经影像学和神经心理学研究的系统综述。
Neuropsychologia. 2024 Jan 10;192:108740. doi: 10.1016/j.neuropsychologia.2023.108740. Epub 2023 Nov 29.
8
Humans adaptively resolve the explore-exploit dilemma under cognitive constraints: Evidence from a multi-armed bandit task.人类在认知限制下适应性地解决探索-利用困境:来自多臂赌博机任务的证据。
Cognition. 2022 Dec;229:105233. doi: 10.1016/j.cognition.2022.105233. Epub 2022 Jul 30.
9
Sources of suboptimality in a minimalistic explore-exploit task.在一个极简探索-利用任务中次优的来源。
Nat Hum Behav. 2019 Apr;3(4):361-368. doi: 10.1038/s41562-018-0526-x. Epub 2019 Feb 11.
10
Ready, set, explore! Event-related potentials reveal the time-course of exploratory decisions.准备,开始,探索!事件相关电位揭示了探索性决策的时间进程。
Brain Res. 2019 Sep 15;1719:183-193. doi: 10.1016/j.brainres.2019.05.039. Epub 2019 May 29.

引用本文的文献

1
Human Strategy Adaptation in Reinforcement Learning Resembles Policy Gradient Ascent.强化学习中的人类策略适应类似于策略梯度上升。
bioRxiv. 2025 Jul 31:2025.07.28.667308. doi: 10.1101/2025.07.28.667308.
2
Trialing addiction neurocircuitry targets and directionality of brain stimulation effects: A deep TMS/fMRI trial in people with alcohol use disorder.试验成瘾神经回路靶点及脑刺激效应的方向性:一项针对酒精使用障碍患者的深部经颅磁刺激/功能磁共振成像试验。
Contemp Clin Trials Commun. 2025 Jun 30;46:101515. doi: 10.1016/j.conctc.2025.101515. eCollection 2025 Aug.
3
Deep Learning Improves Parameter Estimation in Reinforcement Learning Models.

本文引用的文献

1
Balancing exploration and exploitation with information and randomization.通过信息与随机化实现探索与利用的平衡。
Curr Opin Behav Sci. 2021 Apr;38:49-56. doi: 10.1016/j.cobeha.2020.10.001. Epub 2020 Nov 6.
2
Dissociable neural correlates of uncertainty underlie different exploration strategies.不同探索策略的不确定性基础上存在可分离的神经关联。
Nat Commun. 2020 May 12;11(1):2371. doi: 10.1038/s41467-020-15766-z.
3
Mutual benefits: Combining reinforcement learning with sequential sampling models.互惠互利:强化学习与序列抽样模型的结合。
深度学习改进强化学习模型中的参数估计。
bioRxiv. 2025 Jun 18:2025.03.21.644663. doi: 10.1101/2025.03.21.644663.
4
Semantic influences on object detection: Drift diffusion modeling provides insights regarding mechanism.语义对目标检测的影响:漂移扩散模型为机制提供了见解。
PLoS Comput Biol. 2025 Jun 11;21(6):e1012269. doi: 10.1371/journal.pcbi.1012269. eCollection 2025 Jun.
5
TMS-EEG evidence links random exploration to inhibitory mechanisms in the dorsolateral prefrontal cortex.经颅磁刺激-脑电图证据表明,随机探索与背外侧前额叶皮层的抑制机制有关。
Sci Rep. 2025 May 5;15(1):15654. doi: 10.1038/s41598-025-00034-1.
6
Signatures of Perseveration and Heuristic-Based Directed Exploration in Two-Step Sequential Decision Task Behaviour.两步序贯决策任务行为中持续重复和基于启发式的定向探索特征
Comput Psychiatr. 2025 Feb 11;9(1):39-62. doi: 10.5334/cpsy.101. eCollection 2025.
7
A causal role of the right dorsolateral prefrontal cortex in random exploration.右侧背外侧前额叶皮层在随机探索中的因果作用。
Sci Rep. 2024 Oct 22;14(1):24796. doi: 10.1038/s41598-024-76025-5.
8
A tutorial on open-source large language models for behavioral science.行为科学开源大语言模型教程。
Behav Res Methods. 2024 Dec;56(8):8214-8237. doi: 10.3758/s13428-024-02455-8. Epub 2024 Aug 15.
9
The structure and development of explore-exploit decision making.探索-开发决策的结构和发展。
Cogn Psychol. 2024 May;150:101650. doi: 10.1016/j.cogpsych.2024.101650. Epub 2024 Mar 10.
10
The effects of time horizon and guided choices on explore-exploit decisions in rodents.时间范围和引导选择对啮齿动物探索-开发决策的影响。
Behav Neurosci. 2023 Apr;137(2):127-142. doi: 10.1037/bne0000549. Epub 2023 Jan 12.
Neuropsychologia. 2020 Jan;136:107261. doi: 10.1016/j.neuropsychologia.2019.107261. Epub 2019 Nov 14.
4
Computational noise in reward-guided learning drives behavioral variability in volatile environments.奖励导向学习中的计算噪声驱动易变环境中的行为可变性。
Nat Neurosci. 2019 Dec;22(12):2066-2077. doi: 10.1038/s41593-019-0518-9. Epub 2019 Oct 28.
5
Subcortical Substrates of Explore-Exploit Decisions in Primates.灵长类动物探索-利用决策的皮质下基质。
Neuron. 2019 Aug 7;103(3):533-545.e5. doi: 10.1016/j.neuron.2019.05.017. Epub 2019 Jun 10.
6
Generalization guides human exploration in vast decision spaces.泛化指导人类在广阔的决策空间中进行探索。
Nat Hum Behav. 2018 Dec;2(12):915-924. doi: 10.1038/s41562-018-0467-4. Epub 2018 Nov 12.
7
The algorithmic architecture of exploration in the human brain.人类大脑探索的算法架构。
Curr Opin Neurobiol. 2019 Apr;55:7-14. doi: 10.1016/j.conb.2018.11.003. Epub 2018 Dec 6.
8
Exploration Disrupts Choice-Predictive Signals and Alters Dynamics in Prefrontal Cortex.探索破坏了前额叶皮层中的选择预测信号并改变了其动态。
Neuron. 2018 Jan 17;97(2):450-461.e9. doi: 10.1016/j.neuron.2017.12.007. Epub 2017 Dec 28.
9
Deconstructing the human algorithms for exploration.解构人类的探索算法。
Cognition. 2018 Apr;173:34-42. doi: 10.1016/j.cognition.2017.12.014. Epub 2017 Dec 29.
10
Learning the value of information and reward over time when solving exploration-exploitation problems.随着时间的推移,在解决探索-开发问题时学习信息和奖励的价值。
Sci Rep. 2017 Dec 5;7(1):16919. doi: 10.1038/s41598-017-17237-w.