基于Zynq片上系统的多臂赌博机算法：采用频率主义方法还是贝叶斯方法？

Multiarmed Bandit Algorithms on Zynq System-on-Chip: Go Frequentist or Bayesian?

作者信息

Santosh S V Sai, Darak Sumit J

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2602-2615. doi: 10.1109/TNNLS.2022.3190509. Epub 2024 Feb 5.

DOI:10.1109/TNNLS.2022.3190509

Abstract

Multiarmed Bandit (MAB) algorithms identify the best arm among multiple arms via exploration-exploitation trade-off without prior knowledge of arm statistics. Their usefulness in wireless radio, Internet of Things (IoT), and robotics demand deployment on edge devices, and hence, a mapping on system-on-chip (SoC) is desired. Theoretically, the Bayesian-approach-based Thompson sampling (TS) algorithm offers better performance than the frequentist-approach-based upper confidence bound (UCB) algorithm. However, TS is not synthesizable due to Beta function. We address this problem by approximating it via a pseudorandom number generator (PRNG)-based architecture and efficiently realize the TS algorithm on Zynq SoC. In practice, the type of arms distribution (e.g., Bernoulli, Gaussian) is unknown, and hence, a single algorithm may not be optimal. We propose a reconfigurable and intelligent MAB (RI-MAB) framework. Here, intelligence enables the identification of appropriate MAB algorithms in an unknown environment, and reconfigurability allows on-the-fly switching between algorithms on the SoC. This eliminates the need for parallel implementation of algorithms resulting in huge savings in resources and power consumption. We analyze the functional correctness, area, power, and execution time of the proposed and existing architectures for various arm distributions, word length, and hardware-software codesign approaches. We demonstrate the superiority of the RI-MAB algorithm and its architecture over the TS and UCB algorithms.

摘要

多臂赌博机（MAB）算法通过探索-利用权衡在多个臂中识别出最佳臂，而无需事先了解臂的统计信息。它们在无线通信、物联网（IoT）和机器人技术中的实用性要求在边缘设备上进行部署，因此，需要在片上系统（SoC）上进行映射。从理论上讲，基于贝叶斯方法的汤普森采样（TS）算法比基于频率主义方法的上置信界（UCB）算法具有更好的性能。然而，由于贝塔函数，TS无法进行综合。我们通过基于伪随机数生成器（PRNG）的架构对其进行近似来解决这个问题，并在Zynq SoC上高效地实现了TS算法。在实际应用中，臂分布的类型（例如，伯努利分布、高斯分布）是未知的，因此，单一算法可能不是最优的。我们提出了一种可重构智能MAB（RI-MAB）框架。在这里，智能能够在未知环境中识别合适的MAB算法，可重构性允许在SoC上的算法之间进行实时切换。这消除了对算法并行实现的需求，从而在资源和功耗方面实现了巨大节省。我们分析了所提出的和现有架构在各种臂分布、字长和硬件-软件协同设计方法下的功能正确性、面积、功耗和执行时间。我们展示了RI-MAB算法及其架构相对于TS和UCB算法的优越性。

相似文献

Multiarmed Bandit Algorithms on Zynq System-on-Chip: Go Frequentist or Bayesian?基于Zynq片上系统的多臂赌博机算法：采用频率主义方法还是贝叶斯方法？

IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2602-2615. doi: 10.1109/TNNLS.2022.3190509. Epub 2024 Feb 5.

Two-Stage Multiarmed Bandit for Reconfigurable Intelligent Surface Aided Millimeter Wave Communications.用于可重构智能表面辅助毫米波通信的两阶段多臂赌博机算法

Sensors (Basel). 2022 Mar 10;22(6):2179. doi: 10.3390/s22062179.

Enhanced Dynamic Spectrum Access in UAV Wireless Networks for Post-Disaster Area Surveillance System: A Multi-Player Multi-Armed Bandit Approach.用于灾后区域监测系统的无人机无线网络中的增强型动态频谱接入：一种多方多人带臂赌博方法。

Sensors (Basel). 2021 Nov 25;21(23):7855. doi: 10.3390/s21237855.

An Online Minimax Optimal Algorithm for Adversarial Multiarmed Bandit Problem.一种用于对抗性多臂老虎机问题的在线极小极大最优算法。

IEEE Trans Neural Netw Learn Syst. 2018 Nov;29(11):5565-5580. doi: 10.1109/TNNLS.2018.2806006. Epub 2018 Mar 8.

Gateway Selection in Millimeter Wave UAV Wireless Networks Using Multi-Player Multi-Armed Bandit.基于多人多臂老虎机的毫米波无人机无线网络中的网关选择

Sensors (Basel). 2020 Jul 16;20(14):3947. doi: 10.3390/s20143947.

Overtaking method based on sand-sifter mechanism: Why do optimistic value functions find optimal solutions in multi-armed bandit problems?基于筛沙机制的超越方法：为何乐观值函数能在多臂老虎机问题中找到最优解？

Biosystems. 2015 Sep;135:55-65. doi: 10.1016/j.biosystems.2015.06.009. Epub 2015 Jul 10.

A Thompson Sampling Algorithm With Logarithmic Regret for Unimodal Gaussian Bandit.一种针对单峰高斯博弈且具有对数遗憾值的汤普森采样算法。

IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):5332-5341. doi: 10.1109/TNNLS.2023.3295360. Epub 2023 Sep 1.

Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm.非平稳多臂赌博机：一种新概念漂移感知算法的实证评估

Entropy (Basel). 2021 Mar 23;23(3):380. doi: 10.3390/e23030380.

Foraging decisions as multi-armed bandit problems: Applying reinforcement learning algorithms to foraging data.觅食决策作为多臂赌博机问题：将强化学习算法应用于觅食数据。

J Theor Biol. 2019 Apr 21;467:48-56. doi: 10.1016/j.jtbi.2019.02.002. Epub 2019 Feb 6.

An empirical evaluation of active inference in multi-armed bandits.多臂赌博机中主动推理的实证评估。

Neural Netw. 2021 Dec;144:229-246. doi: 10.1016/j.neunet.2021.08.018. Epub 2021 Aug 26.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于Zynq片上系统的多臂赌博机算法：采用频率主义方法还是贝叶斯方法？

Multiarmed Bandit Algorithms on Zynq System-on-Chip: Go Frequentist or Bayesian?

作者信息

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献