• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Online Reinforcement Learning Using a Probability Density Estimation.

作者信息

Agostini Alejandro, Celaya Enric

机构信息

Bernstein Center for Computational Neuroscience, 37077 Göttingen, Germany

Institut de Robòtica i Informàtica Industrial (CSIC-UPC), 08028 Barcelona, Spain

出版信息

Neural Comput. 2017 Jan;29(1):220-246. doi: 10.1162/NECO_a_00906. Epub 2016 Oct 20.

DOI:10.1162/NECO_a_00906
PMID:27764590
Abstract

Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions.

摘要

相似文献

1
Online Reinforcement Learning Using a Probability Density Estimation.
Neural Comput. 2017 Jan;29(1):220-246. doi: 10.1162/NECO_a_00906. Epub 2016 Oct 20.
2
Online EM with weight-based forgetting.带基于权重遗忘的在线期望最大化算法
Neural Comput. 2015 May;27(5):1142-57. doi: 10.1162/NECO_a_00723. Epub 2015 Feb 24.
3
Laplace Approximation for Divisive Gaussian Processes for Nonstationary Regression.拉普拉斯逼近在非平稳回归中的分裂高斯过程。
IEEE Trans Pattern Anal Mach Intell. 2016 Mar;38(3):618-24. doi: 10.1109/TPAMI.2015.2452914.
4
Online Direct Density-Ratio Estimation Applied to Inlier-Based Outlier Detection.应用于基于内点的异常值检测的在线直接密度比估计
Neural Comput. 2015 Sep;27(9):1899-914. doi: 10.1162/NECO_a_00761. Epub 2015 Jul 10.
5
Selective Memory Recursive Least Squares: Recast Forgetting Into Memory in RBF Neural Network-Based Real-Time Learning.
IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6767-6779. doi: 10.1109/TNNLS.2024.3385407. Epub 2025 Apr 4.
6
Self-organizing mixture networks for probability density estimation.
IEEE Trans Neural Netw. 2001;12(2):405-11. doi: 10.1109/72.914534.
7
Model-based reinforcement learning for partially observable games with sampling-based state estimation.基于模型的强化学习在基于采样状态估计的部分可观测博弈中的应用
Neural Comput. 2007 Nov;19(11):3051-87. doi: 10.1162/neco.2007.19.11.3051.
8
Human motion tracking by temporal-spatial local gaussian process experts.基于时空局部高斯过程专家的人体运动跟踪。
IEEE Trans Image Process. 2011 Apr;20(4):1141-51. doi: 10.1109/TIP.2010.2076820. Epub 2010 Sep 16.
9
Divisive Gaussian processes for nonstationary regression.用于非平稳回归的分裂高斯过程。
IEEE Trans Neural Netw Learn Syst. 2014 Nov;25(11):1991-2003. doi: 10.1109/TNNLS.2014.2301951.
10
Reinforcement Learning-Based Multi-AUV Adaptive Trajectory Planning for Under-Ice Field Estimation.基于强化学习的多自主水下航行器冰下区域估计自适应轨迹规划
Sensors (Basel). 2018 Nov 9;18(11):3859. doi: 10.3390/s18113859.