• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

他人的行为在社会强化学习的背景下充当了一种虚假奖励,驱动着模仿。

The actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning.

机构信息

Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Santé et de la Recherche Médicale, Paris, France.

Département d'Études Cognitives, École Normale Supérieure, Paris, France.

出版信息

PLoS Biol. 2020 Dec 8;18(12):e3001028. doi: 10.1371/journal.pbio.3001028. eCollection 2020 Dec.

DOI:10.1371/journal.pbio.3001028
PMID:33290387
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7723279/
Abstract

While there is no doubt that social signals affect human reinforcement learning, there is still no consensus about how this process is computationally implemented. To address this issue, we compared three psychologically plausible hypotheses about the algorithmic implementation of imitation in reinforcement learning. The first hypothesis, decision biasing (DB), postulates that imitation consists in transiently biasing the learner's action selection without affecting their value function. According to the second hypothesis, model-based imitation (MB), the learner infers the demonstrator's value function through inverse reinforcement learning and uses it to bias action selection. Finally, according to the third hypothesis, value shaping (VS), the demonstrator's actions directly affect the learner's value function. We tested these three hypotheses in 2 experiments (N = 24 and N = 44) featuring a new variant of a social reinforcement learning task. We show through model comparison and model simulation that VS provides the best explanation of learner's behavior. Results replicated in a third independent experiment featuring a larger cohort and a different design (N = 302). In our experiments, we also manipulated the quality of the demonstrators' choices and found that learners were able to adapt their imitation rate, so that only skilled demonstrators were imitated. We proposed and tested an efficient meta-learning process to account for this effect, where imitation is regulated by the agreement between the learner and the demonstrator. In sum, our findings provide new insights and perspectives on the computational mechanisms underlying adaptive imitation in human reinforcement learning.

摘要

虽然社会信号无疑会影响人类的强化学习,但对于这一过程是如何在计算上实现的,仍然没有共识。为了解决这个问题,我们比较了强化学习中模仿的算法实现的三个心理上合理的假设。第一个假设是决策偏向(Decision Biasing,DB),它假定模仿包括暂时偏向学习者的动作选择,而不影响他们的价值函数。根据第二个假设,基于模型的模仿(Model-based Imitation,MB),学习者通过逆强化学习推断出示范者的价值函数,并使用它来偏向动作选择。最后,根据第三个假设,价值塑造(Value Shaping,VS),示范者的动作直接影响学习者的价值函数。我们在两个实验(N=24 和 N=44)中测试了这三个假设,这些实验具有一个社会强化学习任务的新变体。通过模型比较和模型模拟,我们表明 VS 提供了对学习者行为的最佳解释。在第三个独立实验中,我们使用更大的队列和不同的设计(N=302)重复了结果。在我们的实验中,我们还操纵了示范者选择的质量,发现学习者能够调整他们的模仿率,只模仿熟练的示范者。我们提出并测试了一个有效的元学习过程来解释这种效应,其中模仿受到学习者和示范者之间一致性的调节。总之,我们的发现为人类强化学习中适应性模仿的计算机制提供了新的见解和视角。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5df/7723279/1bcf5212915b/pbio.3001028.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5df/7723279/89dcc7c86a40/pbio.3001028.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5df/7723279/eb4534472188/pbio.3001028.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5df/7723279/502f8eb1f737/pbio.3001028.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5df/7723279/fbdf0ff1ebe5/pbio.3001028.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5df/7723279/44464d43a1ea/pbio.3001028.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5df/7723279/22fb125f9bef/pbio.3001028.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5df/7723279/685e4705237c/pbio.3001028.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5df/7723279/1bcf5212915b/pbio.3001028.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5df/7723279/89dcc7c86a40/pbio.3001028.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5df/7723279/eb4534472188/pbio.3001028.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5df/7723279/502f8eb1f737/pbio.3001028.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5df/7723279/fbdf0ff1ebe5/pbio.3001028.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5df/7723279/44464d43a1ea/pbio.3001028.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5df/7723279/22fb125f9bef/pbio.3001028.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5df/7723279/685e4705237c/pbio.3001028.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5df/7723279/1bcf5212915b/pbio.3001028.g008.jpg

相似文献

1
The actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning.他人的行为在社会强化学习的背景下充当了一种虚假奖励,驱动着模仿。
PLoS Biol. 2020 Dec 8;18(12):e3001028. doi: 10.1371/journal.pbio.3001028. eCollection 2020 Dec.
2
Demonstrator skill modulates observational aversive learning.示范者技能调节观察性厌恶学习。
Cognition. 2014 Oct;133(1):128-39. doi: 10.1016/j.cognition.2014.06.010. Epub 2014 Jul 11.
3
Teacher-learner interaction quantifies scaffolding behaviour in imitation learning.师生互动量化了模仿学习中的支架行为。
Sci Rep. 2019 May 17;9(1):7543. doi: 10.1038/s41598-019-44049-x.
4
Selective imitation in domestic dogs.家犬的选择性模仿。
Curr Biol. 2007 May 15;17(10):868-72. doi: 10.1016/j.cub.2007.04.026. Epub 2007 Apr 26.
5
Egocentric mental transformation of self: effects of spatial relationship in mirror-image and anatomic imitations.自我中心心理转换:镜像和解剖模仿中的空间关系的影响。
Exp Brain Res. 2012 Aug;221(1):27-32. doi: 10.1007/s00221-012-3143-3. Epub 2012 Jun 23.
6
Evidence for a dual-process account of over-imitation: Children imitate anti- and prosocial models equally, but prefer prosocial models once they become aware of multiple solutions to a task.对过度模仿的双重加工理论的证据:儿童平等地模仿反社会和亲社会的模型,但一旦他们意识到解决任务的多种方法,就会更喜欢亲社会的模型。
PLoS One. 2021 Sep 16;16(9):e0256614. doi: 10.1371/journal.pone.0256614. eCollection 2021.
7
How self-generated labelling shapes transfer of learning during early childhood: The role of individual differences.自我生成的标签如何塑造幼儿期学习的迁移:个体差异的作用。
Br J Dev Psychol. 2019 Mar;37(1):68-83. doi: 10.1111/bjdp.12254. Epub 2018 Jul 7.
8
Young children use imitation communicatively.幼儿使用模仿进行交际。
J Exp Child Psychol. 2023 Jul;231:105654. doi: 10.1016/j.jecp.2023.105654. Epub 2023 Mar 15.
9
Model expertise does not influence automatic imitation.模型专业知识不会影响自动模仿。
Exp Brain Res. 2022 Apr;240(4):1267-1277. doi: 10.1007/s00221-022-06338-2. Epub 2022 Feb 25.
10
A model's natural group membership affects over-imitation in 6-year-olds.模型的自然群体归属会影响 6 岁儿童的过度模仿。
J Exp Child Psychol. 2020 Apr;192:104783. doi: 10.1016/j.jecp.2019.104783. Epub 2020 Jan 14.

引用本文的文献

1
Neural correlates of Bayesian social belief updating in the medial prefrontal cortex.内侧前额叶皮质中贝叶斯社会信念更新的神经关联
Cereb Cortex. 2025 Aug 1;35(8). doi: 10.1093/cercor/bhaf251.
2
Feature-based reward learning shapes human social learning strategies.基于特征的奖励学习塑造人类社会学习策略。
Nat Hum Behav. 2025 Jul 23. doi: 10.1038/s41562-025-02269-4.
3
The Effects of Teacher Rewards and Their Types on Preschool Children's Selective Trust.教师奖励及其类型对学龄前儿童选择性信任的影响。

本文引用的文献

1
Reinforcement Learning With Human Advice: A Survey.基于人类建议的强化学习:一项综述。
Front Robot AI. 2021 Jun 1;8:584075. doi: 10.3389/frobt.2021.584075. eCollection 2021.
2
The neural and computational systems of social learning.社会学习的神经和计算系统。
Nat Rev Neurosci. 2020 Apr;21(4):197-212. doi: 10.1038/s41583-020-0276-4. Epub 2020 Mar 12.
3
Depressive symptoms are associated with blunted reward learning in social contexts.抑郁症状与社会情境中奖励学习能力下降有关。
Behav Sci (Basel). 2025 Jun 12;15(6):804. doi: 10.3390/bs15060804.
4
Adaptive mechanisms of social and asocial learning in immersive collective foraging.沉浸式集体觅食中社会学习与非社会学习的适应性机制。
Nat Commun. 2025 Apr 25;16(1):3539. doi: 10.1038/s41467-025-58365-6.
5
Self-utility distance as a computational approach to understanding self-concept clarity.自我效用距离作为一种理解自我概念清晰度的计算方法。
Commun Psychol. 2025 Mar 25;3(1):50. doi: 10.1038/s44271-025-00231-8.
6
When to stop social learning from a predecessor in an information-foraging task.在信息搜寻任务中,何时停止向先行者进行社会学习。
Evol Hum Sci. 2025 Jan 20;7:e2. doi: 10.1017/ehs.2024.29. eCollection 2025.
7
Effects of described demonstrator ability on brain and behavior when learning from others.在向他人学习时,所描述的示范者能力对大脑和行为的影响。
NPJ Sci Learn. 2025 Jan 16;10(1):4. doi: 10.1038/s41539-024-00292-0.
8
Experience and advice consequences shape information sharing strategies.经验和建议结果塑造信息共享策略。
Commun Psychol. 2024 Dec 19;2(1):123. doi: 10.1038/s44271-024-00175-5.
9
Causal involvement of dorsomedial prefrontal cortex in learning the predictability of observable actions.背内侧前额叶皮质在学习可观察动作的可预测性中的因果作用。
Nat Commun. 2024 Sep 27;15(1):8305. doi: 10.1038/s41467-024-52559-0.
10
Humans flexibly integrate social information despite interindividual differences in reward.人类在奖励存在个体差异的情况下,仍能灵活地整合社会信息。
Proc Natl Acad Sci U S A. 2024 Sep 24;121(39):e2404928121. doi: 10.1073/pnas.2404928121. Epub 2024 Sep 20.
PLoS Comput Biol. 2019 Jul 29;15(7):e1007224. doi: 10.1371/journal.pcbi.1007224. eCollection 2019 Jul.
4
Social learning through prediction error in the brain.大脑中通过预测误差进行的社会学习。
NPJ Sci Learn. 2017 Jun 16;2:8. doi: 10.1038/s41539-017-0009-2. eCollection 2017.
5
The computational basis of following advice in adolescents.青少年听从建议的计算基础。
J Exp Child Psychol. 2019 Apr;180:39-54. doi: 10.1016/j.jecp.2018.11.019. Epub 2019 Jan 2.
6
Developmental differences in the neural dynamics of observational learning.观察学习的神经动力学的发展差异。
Neuropsychologia. 2018 Oct;119:12-23. doi: 10.1016/j.neuropsychologia.2018.07.022. Epub 2018 Jul 21.
7
The Role of Intelligence in Social Learning.智力在社会学习中的作用。
Sci Rep. 2018 May 2;8(1):6896. doi: 10.1038/s41598-018-25289-9.
8
Beliefs about Others' Abilities Alter Learning from Observation.他人能力信念影响观察学习。
Sci Rep. 2017 Nov 23;7(1):16173. doi: 10.1038/s41598-017-16307-3.
9
'Blue Whale Challenge': A Game or Crime?“蓝鲸挑战”:游戏还是犯罪?
Sci Eng Ethics. 2019 Feb;25(1):285-291. doi: 10.1007/s11948-017-0004-2. Epub 2017 Nov 11.
10
Neural computations underlying inverse reinforcement learning in the human brain.人类大脑中反向强化学习的神经计算。
Elife. 2017 Oct 30;6:e29718. doi: 10.7554/eLife.29718.