Suppr超能文献

探索随机离散多臂老虎机时信息价值的分析

An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits.

作者信息

Sledge Isaac J, Príncipe José C

机构信息

Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611, USA.

Computational NeuroEngineering Laboratory (CNEL), University of Florida, Gainesville, FL 32611, USA.

出版信息

Entropy (Basel). 2018 Feb 28;20(3):155. doi: 10.3390/e20030155.

Abstract

In this paper, we propose an information-theoretic exploration strategy for stochastic, discrete multi-armed bandits that achieves optimal regret. Our strategy is based on the value of information criterion. This criterion measures the trade-off between policy information and obtainable rewards. High amounts of policy information are associated with exploration-dominant searches of the space and yield high rewards. Low amounts of policy information favor the exploitation of existing knowledge. Information, in this criterion, is quantified by a parameter that can be varied during search. We demonstrate that a simulated-annealing-like update of this parameter, with a sufficiently fast cooling schedule, leads to a regret that is logarithmic with respect to the number of arm pulls.

摘要

在本文中,我们为随机离散多臂赌博机提出了一种实现最优遗憾值的信息论探索策略。我们的策略基于信息价值准则。该准则衡量了策略信息与可获得奖励之间的权衡。大量的策略信息与对空间的探索主导型搜索相关联,并带来高奖励。少量的策略信息有利于对现有知识的利用。在此准则中,信息由一个在搜索过程中可以变化的参数来量化。我们证明,对该参数进行类似模拟退火的更新,并采用足够快的冷却进度表,会导致遗憾值相对于拉臂次数呈对数关系。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f22/7512671/91950d7d7117/entropy-20-00155-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验