• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

非平稳多臂赌博机:一种新概念漂移感知算法的实证评估

Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm.

作者信息

Cavenaghi Emanuele, Sottocornola Gabriele, Stella Fabio, Zanker Markus

机构信息

Dipartimento di Informatica, Sistemistica e Comunicazione, University of Milano-Bicocca, 20126 Milano, Italy.

Facoltà di Scienze e Tecnologie Informatiche, Free University of Bozen-Bolzano, 39100 Bolzano, Italy.

出版信息

Entropy (Basel). 2021 Mar 23;23(3):380. doi: 10.3390/e23030380.

DOI:10.3390/e23030380
PMID:33807028
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8004723/
Abstract

The Multi-Armed Bandit (MAB) problem has been extensively studied in order to address real-world challenges related to sequential decision making. In this setting, an agent selects the best action to be performed at time-step , based on the past rewards received by the environment. This formulation implicitly assumes that the expected payoff for each action is kept stationary by the environment through time. Nevertheless, in many real-world applications this assumption does not hold and the agent has to face a non-stationary environment, that is, with a changing reward distribution. Thus, we present a new MAB algorithm, named (), for non-stationary environments, that is, when the data streaming is affected by . The algorithm is based on Thompson Sampling (TS) and exploits a discount factor on the reward history and an arm-related sliding window to contrast concept drift in non-stationary environments. We investigate how to combine these two sources of information, namely the discount factor and the sliding window, by means of an aggregation function f(.). In particular, we proposed a pessimistic (f=min), an optimistic (f=max), as well as an averaged (f=mean) version of the algorithm. A rich set of numerical experiments is performed to evaluate the algorithm compared to both stationary and non-stationary state-of-the-art TS baselines. We exploited synthetic environments (both randomly-generated and controlled) to test the MAB algorithms under different types of drift, that is, sudden/abrupt, incremental, gradual and increasing/decreasing drift. Furthermore, we adapt four real-world active learning tasks to our framework-a prediction task on crimes in the city of Baltimore, a classification task on insects species, a recommendation task on local web-news, and a time-series analysis on microbial organisms in the tropical air ecosystem. The approach emerges as the best performing MAB algorithm. At least one of the versions of performs better than the baselines in synthetic environments, proving the robustness of under different concept drift types. Moreover, the pessimistic version (f=min) results as the most effective in all real-world tasks.

摘要

为应对与序列决策相关的现实世界挑战,多臂赌博机(MAB)问题已得到广泛研究。在这种情况下,智能体根据环境过去给予的奖励,在时间步(t)选择要执行的最佳动作。这种表述隐含地假设环境会使每个动作的预期收益随时间保持不变。然而,在许多实际应用中,这个假设并不成立,智能体必须面对非平稳环境,即奖励分布不断变化的环境。因此,我们提出了一种新的适用于非平稳环境的MAB算法,称为(()),即当数据流受(())影响时。该算法基于汤普森采样(TS),并利用奖励历史上的折扣因子和与臂相关的滑动窗口来应对非平稳环境中的概念漂移。我们研究如何通过聚合函数(f(.))将这两种信息源,即折扣因子和滑动窗口,结合起来。具体而言,我们提出了(())算法的悲观版本((f = \min))、乐观版本((f = \max))以及平均版本((f = \text{mean}))。我们进行了一系列丰富的数值实验,将(())算法与平稳和非平稳的最先进TS基线进行比较。我们利用合成环境(包括随机生成的和可控的)在不同类型的漂移下测试MAB算法,即突然/急剧、增量、逐渐以及增加/减少漂移。此外,我们将四个实际的主动学习任务适配到我们的框架中——巴尔的摩市犯罪预测任务、昆虫物种分类任务、本地网络新闻推荐任务以及热带空气生态系统中微生物的时间序列分析。(())方法成为性能最佳的MAB算法。在合成环境中,(())的至少一个版本比基线表现更好,证明了(())在不同概念漂移类型下的鲁棒性。此外,悲观版本((f = \min))在所有实际任务中最为有效。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/9460dac73b60/entropy-23-00380-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/ffd1512629f9/entropy-23-00380-g0A1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/59b545e5670f/entropy-23-00380-g0A2a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/20f7a6b75348/entropy-23-00380-g0A3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/d02f4226ff8c/entropy-23-00380-g0A4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/f3870a7fb604/entropy-23-00380-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/fd3ad6449d5a/entropy-23-00380-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/14cc4fa56cb8/entropy-23-00380-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/bcf46e14a949/entropy-23-00380-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/dcca583aec25/entropy-23-00380-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/b0008cff9df6/entropy-23-00380-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/750c4d576948/entropy-23-00380-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/6c6768b02a1b/entropy-23-00380-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/c90fd2b69861/entropy-23-00380-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/dee0be112c7b/entropy-23-00380-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/9460dac73b60/entropy-23-00380-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/ffd1512629f9/entropy-23-00380-g0A1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/59b545e5670f/entropy-23-00380-g0A2a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/20f7a6b75348/entropy-23-00380-g0A3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/d02f4226ff8c/entropy-23-00380-g0A4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/f3870a7fb604/entropy-23-00380-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/fd3ad6449d5a/entropy-23-00380-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/14cc4fa56cb8/entropy-23-00380-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/bcf46e14a949/entropy-23-00380-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/dcca583aec25/entropy-23-00380-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/b0008cff9df6/entropy-23-00380-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/750c4d576948/entropy-23-00380-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/6c6768b02a1b/entropy-23-00380-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/c90fd2b69861/entropy-23-00380-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/dee0be112c7b/entropy-23-00380-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c1c/8004723/9460dac73b60/entropy-23-00380-g011.jpg

相似文献

1
Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm.非平稳多臂赌博机:一种新概念漂移感知算法的实证评估
Entropy (Basel). 2021 Mar 23;23(3):380. doi: 10.3390/e23030380.
2
An empirical evaluation of active inference in multi-armed bandits.多臂赌博机中主动推理的实证评估。
Neural Netw. 2021 Dec;144:229-246. doi: 10.1016/j.neunet.2021.08.018. Epub 2021 Aug 26.
3
Overtaking method based on sand-sifter mechanism: Why do optimistic value functions find optimal solutions in multi-armed bandit problems?基于筛沙机制的超越方法:为何乐观值函数能在多臂老虎机问题中找到最优解?
Biosystems. 2015 Sep;135:55-65. doi: 10.1016/j.biosystems.2015.06.009. Epub 2015 Jul 10.
4
Gateway Selection in Millimeter Wave UAV Wireless Networks Using Multi-Player Multi-Armed Bandit.基于多人多臂老虎机的毫米波无人机无线网络中的网关选择
Sensors (Basel). 2020 Jul 16;20(14):3947. doi: 10.3390/s20143947.
5
A Thompson Sampling Algorithm With Logarithmic Regret for Unimodal Gaussian Bandit.一种针对单峰高斯博弈且具有对数遗憾值的汤普森采样算法。
IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):5332-5341. doi: 10.1109/TNNLS.2023.3295360. Epub 2023 Sep 1.
6
Optimism in the face of uncertainty supported by a statistically-designed multi-armed bandit algorithm.面对不确定性时的乐观态度由一种经过统计设计的多臂赌博机算法提供支持。
Biosystems. 2017 Oct;160:25-32. doi: 10.1016/j.biosystems.2017.08.004. Epub 2017 Aug 22.
7
Designing a Streaming Algorithm for Outlier Detection in Data Mining-An Incrementa Approach.设计一种用于数据挖掘中异常值检测的流算法-一种增量方法。
Sensors (Basel). 2020 Feb 26;20(5):1261. doi: 10.3390/s20051261.
8
Bandit Change-Point Detection for Real-Time Monitoring High-Dimensional Data Under Sampling Control.用于在采样控制下实时监测高维数据的强盗变点检测
Technometrics. 2023;65(1):33-43. doi: 10.1080/00401706.2022.2054861. Epub 2022 Apr 22.
9
Risk-aware multi-armed bandit problem with application to portfolio selection.应用于投资组合选择的风险感知多臂老虎机问题。
R Soc Open Sci. 2017 Nov 15;4(11):171377. doi: 10.1098/rsos.171377. eCollection 2017 Nov.
10
An Optimal Algorithm for the Stochastic Bandits While Knowing the Near-Optimal Mean Reward.已知最优平均回报的随机带臂赌博机的最优算法。
IEEE Trans Neural Netw Learn Syst. 2021 May;32(5):2285-2291. doi: 10.1109/TNNLS.2020.2995920. Epub 2021 May 3.

引用本文的文献

1
Thompson Sampling for Non-Stationary Bandit Problems.非平稳多臂老虎机问题的汤普森采样
Entropy (Basel). 2025 Jan 9;27(1):51. doi: 10.3390/e27010051.
2
Maximum Entropy Exploration in Contextual Bandits with Neural Networks and Energy Based Models.基于神经网络和能量模型的上下文博弈中的最大熵探索
Entropy (Basel). 2023 Jan 18;25(2):188. doi: 10.3390/e25020188.
3
Similar network compositions, but distinct neural dynamics underlying belief updating in environments with and without explicit outcomes.相似的网络构成,但在有无明确结果的环境中,信念更新的神经动力学不同。

本文引用的文献

1
Microbial communities in the tropical air ecosystem follow a precise diel cycle.热带空气生态系统中的微生物群落遵循精确的昼夜周期。
Proc Natl Acad Sci U S A. 2019 Nov 12;116(46):23299-23308. doi: 10.1073/pnas.1908493116. Epub 2019 Oct 28.
2
Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges.用于临床试验优化设计的多臂老虎机模型:益处与挑战
Stat Sci. 2015;30(2):199-215. doi: 10.1214/14-STS504.
3
Incremental learning of concept drift in nonstationary environments.非平稳环境中概念漂移的增量学习
Neuroimage. 2022 Feb 15;247:118821. doi: 10.1016/j.neuroimage.2021.118821. Epub 2021 Dec 14.
IEEE Trans Neural Netw. 2011 Oct;22(10):1517-31. doi: 10.1109/TNN.2011.2160459. Epub 2011 Aug 4.