Santosh S V Sai, Darak Sumit J
IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2602-2615. doi: 10.1109/TNNLS.2022.3190509. Epub 2024 Feb 5.
Multiarmed Bandit (MAB) algorithms identify the best arm among multiple arms via exploration-exploitation trade-off without prior knowledge of arm statistics. Their usefulness in wireless radio, Internet of Things (IoT), and robotics demand deployment on edge devices, and hence, a mapping on system-on-chip (SoC) is desired. Theoretically, the Bayesian-approach-based Thompson sampling (TS) algorithm offers better performance than the frequentist-approach-based upper confidence bound (UCB) algorithm. However, TS is not synthesizable due to Beta function. We address this problem by approximating it via a pseudorandom number generator (PRNG)-based architecture and efficiently realize the TS algorithm on Zynq SoC. In practice, the type of arms distribution (e.g., Bernoulli, Gaussian) is unknown, and hence, a single algorithm may not be optimal. We propose a reconfigurable and intelligent MAB (RI-MAB) framework. Here, intelligence enables the identification of appropriate MAB algorithms in an unknown environment, and reconfigurability allows on-the-fly switching between algorithms on the SoC. This eliminates the need for parallel implementation of algorithms resulting in huge savings in resources and power consumption. We analyze the functional correctness, area, power, and execution time of the proposed and existing architectures for various arm distributions, word length, and hardware-software codesign approaches. We demonstrate the superiority of the RI-MAB algorithm and its architecture over the TS and UCB algorithms.
多臂赌博机(MAB)算法通过探索-利用权衡在多个臂中识别出最佳臂,而无需事先了解臂的统计信息。它们在无线通信、物联网(IoT)和机器人技术中的实用性要求在边缘设备上进行部署,因此,需要在片上系统(SoC)上进行映射。从理论上讲,基于贝叶斯方法的汤普森采样(TS)算法比基于频率主义方法的上置信界(UCB)算法具有更好的性能。然而,由于贝塔函数,TS无法进行综合。我们通过基于伪随机数生成器(PRNG)的架构对其进行近似来解决这个问题,并在Zynq SoC上高效地实现了TS算法。在实际应用中,臂分布的类型(例如,伯努利分布、高斯分布)是未知的,因此,单一算法可能不是最优的。我们提出了一种可重构智能MAB(RI-MAB)框架。在这里,智能能够在未知环境中识别合适的MAB算法,可重构性允许在SoC上的算法之间进行实时切换。这消除了对算法并行实现的需求,从而在资源和功耗方面实现了巨大节省。我们分析了所提出的和现有架构在各种臂分布、字长和硬件-软件协同设计方法下的功能正确性、面积、功耗和执行时间。我们展示了RI-MAB算法及其架构相对于TS和UCB算法的优越性。