• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

主动自适应感知学习。

Learning to learn with active adaptive perception.

机构信息

Building 176 Boldrewood Innovation Campus, University of Southampton, Burgess Road, Southampton SO16 7QF, United Kingdom.

出版信息

Neural Netw. 2019 Jul;115:30-49. doi: 10.1016/j.neunet.2019.03.006. Epub 2019 Mar 25.

DOI:10.1016/j.neunet.2019.03.006
PMID:30959321
Abstract

Increasingly, autonomous agents will be required to operate on long-term missions. This will create a demand for general intelligence because feedback from a human operator may be sparse and delayed, and because not all behaviours can be prescribed. Deep neural networks and reinforcement learning methods can be applied in such environments but their fixed updating routines imply an inductive bias in learning spatio-temporal patterns, meaning some environments will be unsolvable. To address this problem, this paper proposes active adaptive perception, the ability of an architecture to learn when and how to modify and selectively utilise its perception module. To achieve this, a generic architecture based on a self-modifying policy (SMP) is proposed, and implemented using Incremental Self-improvement with the Success Story Algorithm. The architecture contrasts to deep reinforcement learning systems which follow fixed training strategies and earlier SMP studies which for perception relied either entirely on the working memory or on untrainable active perception instructions. One computationally cheap and one more expensive implementation are presented and compared to DRQN, an off-policy deep reinforcement learner using experience replay and Incremental Self-improvement, an SMP, on various non-episodic partially observable mazes. The results show that the simple instruction set leads to emergent strategies to avoid detracting corridors and rooms, and that the expensive implementation allows selectively ignoring perception where it is inaccurate.

摘要

越来越多的自主代理将需要执行长期任务。这将产生对通用智能的需求,因为来自人类操作员的反馈可能稀疏且延迟,并且并非所有行为都可以规定。深度神经网络和强化学习方法可以在这种环境中应用,但它们的固定更新例程意味着在学习时空模式方面存在归纳偏差,这意味着某些环境是无法解决的。为了解决这个问题,本文提出了主动自适应感知,即架构学习何时以及如何修改和有选择地利用其感知模块的能力。为此,提出了一种基于自修改策略(SMP)的通用架构,并使用带有成功故事算法的增量自我改进来实现。该架构与遵循固定训练策略的深度强化学习系统以及早期完全依赖工作记忆或不可训练的主动感知指令的 SMP 研究形成对比。提出并比较了一种计算成本较低和一种计算成本较高的实现方式,以及使用经验重放和增量自我改进的 off-policy 深度强化学习者 DRQN,用于各种非情节部分可观察的迷宫。结果表明,简单的指令集导致了避免干扰走廊和房间的突发策略,而昂贵的实现方式允许选择性地忽略不准确的感知。

相似文献

1
Learning to learn with active adaptive perception.主动自适应感知学习。
Neural Netw. 2019 Jul;115:30-49. doi: 10.1016/j.neunet.2019.03.006. Epub 2019 Mar 25.
2
Integrating temporal difference methods and self-organizing neural networks for reinforcement learning with delayed evaluative feedback.将时间差分方法与自组织神经网络相结合用于具有延迟评估反馈的强化学习。
IEEE Trans Neural Netw. 2008 Feb;19(2):230-44. doi: 10.1109/TNN.2007.905839.
3
Autonomous reinforcement learning with experience replay.自主强化学习与经验回放。
Neural Netw. 2013 May;41:156-67. doi: 10.1016/j.neunet.2012.11.007. Epub 2012 Nov 29.
4
Feedback for reinforcement learning based brain-machine interfaces using confidence metrics.基于置信度指标的用于脑机接口的强化学习反馈
J Neural Eng. 2017 Jun;14(3):036016. doi: 10.1088/1741-2552/aa6317. Epub 2017 Feb 27.
5
Emergent Solutions to High-Dimensional Multitask Reinforcement Learning.高维多任务强化学习的应急解决方案。
Evol Comput. 2018 Fall;26(3):347-380. doi: 10.1162/evco_a_00232. Epub 2018 Jun 22.
6
Dynamic evolving spiking neural networks for on-line spatio- and spectro-temporal pattern recognition.用于在线时空谱模式识别的动态进化尖峰神经网络。
Neural Netw. 2013 May;41:188-201. doi: 10.1016/j.neunet.2012.11.014. Epub 2012 Dec 20.
7
Deep reinforcement learning for automated radiation adaptation in lung cancer.深度强化学习在肺癌放射自适应中的应用。
Med Phys. 2017 Dec;44(12):6690-6705. doi: 10.1002/mp.12625. Epub 2017 Nov 14.
8
Lifelong learning of human actions with deep neural network self-organization.基于深度神经网络自组织的人类行为终身学习。
Neural Netw. 2017 Dec;96:137-149. doi: 10.1016/j.neunet.2017.09.001. Epub 2017 Sep 20.
9
Spatio-temporal memories for machine learning: a long-term memory organization.用于机器学习的时空记忆:一种长期记忆组织
IEEE Trans Neural Netw. 2009 May;20(5):768-80. doi: 10.1109/TNN.2009.2012854. Epub 2009 Mar 27.
10
Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。
Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.

引用本文的文献

1
Current Situation and Strategy Formulation of College Sports Psychology Teaching Following Adaptive Learning and Deep Learning Under Information Education.信息教育背景下适应学习与深度学习视域下高校体育心理学教学的现状及策略制定
Front Psychol. 2022 Jan 17;12:766621. doi: 10.3389/fpsyg.2021.766621. eCollection 2021.