Suppr超能文献

面对不确定性时的乐观态度由一种经过统计设计的多臂赌博机算法提供支持。

Optimism in the face of uncertainty supported by a statistically-designed multi-armed bandit algorithm.

作者信息

Kamiura Moto, Sano Kohei

机构信息

Graduate School of Science and Engineering, Tokyo Denki University, Japan; School of Science and Engineering, Tokyo Denki University, Japan.

Graduate School of Science and Engineering, Tokyo Denki University, Japan.

出版信息

Biosystems. 2017 Oct;160:25-32. doi: 10.1016/j.biosystems.2017.08.004. Epub 2017 Aug 22.

Abstract

The principle of optimism in the face of uncertainty is known as a heuristic in sequential decision-making problems. Overtaking method based on this principle is an effective algorithm to solve multi-armed bandit problems. It was defined by a set of some heuristic patterns of the formulation in the previous study. The objective of the present paper is to redefine the value functions of Overtaking method and to unify the formulation of them. The unified Overtaking method is associated with upper bounds of confidence intervals of expected rewards on statistics. The unification of the formulation enhances the universality of Overtaking method. Consequently we newly obtain Overtaking method for the exponentially distributed rewards, numerically analyze it, and show that it outperforms UCB algorithm on average. The present study suggests that the principle of optimism in the face of uncertainty should be regarded as the statistics-based consequence of the law of large numbers for the sample mean of rewards and estimation of upper bounds of expected rewards, rather than as a heuristic, in the context of multi-armed bandit problems.

摘要

在序列决策问题中,面对不确定性时的乐观原则被称为一种启发式方法。基于此原则的超车方法是解决多臂老虎机问题的一种有效算法。它是由先前研究中一组特定的启发式模式定义的。本文的目的是重新定义超车方法的价值函数并统一其公式。统一的超车方法与统计中预期奖励置信区间的上限相关。公式的统一增强了超车方法的通用性。因此,我们新得到了适用于指数分布奖励的超车方法,对其进行了数值分析,并表明它平均优于UCB算法。本研究表明,在多臂老虎机问题的背景下,面对不确定性时的乐观原则应被视为基于奖励样本均值的大数定律和预期奖励上限估计的基于统计的结果,而不是一种启发式方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验