• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

具有非线性博弈反馈和多样性约束的Top-K多臂博弈的主从深度架构

Master-Slave Deep Architecture for Top-K Multiarmed Bandits With Nonlinear Bandit Feedback and Diversity Constraints.

作者信息

Huang Hanchi, Shen Li, Ye Deheng, Liu Wei

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):17608-17619. doi: 10.1109/TNNLS.2023.3306801. Epub 2024 Dec 2.

DOI:10.1109/TNNLS.2023.3306801
PMID:37999964
Abstract

We propose a novel master-slave architecture to solve the top- combinatorial multiarmed bandits (CMABs) problem with nonlinear bandit feedback and diversity constraints, which, to the best of our knowledge, is the first combinatorial bandits setting considering diversity constraints under bandit feedback. Specifically, to efficiently explore the combinatorial and constrained action space, we introduce six slave models with distinguished merits to generate diversified samples well balancing rewards and constraints as well as efficiency. Moreover, we propose teacher learning-based optimization and the policy cotraining technique to boost the performance of the multiple slave models. The master model then collects the elite samples provided by the slave models and selects the best sample estimated by a neural contextual UCB-based network (NeuralUCB) to decide on a tradeoff between exploration and exploitation. Thanks to the elaborate design of slave models, the cotraining mechanism among slave models, and the novel interactions between the master and slave models, our approach significantly surpasses existing state-of-the-art algorithms in both synthetic and real datasets for recommendation tasks. The code is available at https://github.com/huanghanchi/Master-slave-Algorithm-for-Top-K-Bandits.

摘要

我们提出了一种新颖的主从架构,以解决具有非线性博弈反馈和多样性约束的顶级组合多臂博弈(CMABs)问题,据我们所知,这是在博弈反馈下考虑多样性约束的首个组合博弈设置。具体而言,为了有效地探索组合且受约束的动作空间,我们引入了六个具有显著优点的从模型,以生成能在奖励、约束以及效率之间实现良好平衡的多样化样本。此外,我们提出基于教师学习的优化方法和策略协同训练技术,以提升多个从模型的性能。主模型随后收集从模型提供的精英样本,并选择由基于神经上下文上置信界(NeuralUCB)的网络估计出的最佳样本,以在探索和利用之间做出权衡。得益于从模型的精心设计、从模型之间的协同训练机制以及主从模型之间的新颖交互,我们的方法在用于推荐任务的合成数据集和真实数据集中均显著超越了现有的最先进算法。代码可在https://github.com/huanghanchi/Master-slave-Algorithm-for-Top-K-Bandits获取。

相似文献

1
Master-Slave Deep Architecture for Top-K Multiarmed Bandits With Nonlinear Bandit Feedback and Diversity Constraints.具有非线性博弈反馈和多样性约束的Top-K多臂博弈的主从深度架构
IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):17608-17619. doi: 10.1109/TNNLS.2023.3306801. Epub 2024 Dec 2.
2
A Multiplier Bootstrap Approach to Designing Robust Algorithms for Contextual Bandits.一种用于为情境博弈设计稳健算法的乘数自助法。
IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):9887-9899. doi: 10.1109/TNNLS.2022.3161806. Epub 2023 Nov 30.
3
An empirical evaluation of active inference in multi-armed bandits.多臂赌博机中主动推理的实证评估。
Neural Netw. 2021 Dec;144:229-246. doi: 10.1016/j.neunet.2021.08.018. Epub 2021 Aug 26.
4
A Contextual-Bandit-Based Approach for Informed Decision-Making in Clinical Trials.一种基于情境博弈的临床试验明智决策方法。
Life (Basel). 2022 Aug 21;12(8):1277. doi: 10.3390/life12081277.
5
An Optimal Algorithm for the Stochastic Bandits While Knowing the Near-Optimal Mean Reward.已知最优平均回报的随机带臂赌博机的最优算法。
IEEE Trans Neural Netw Learn Syst. 2021 May;32(5):2285-2291. doi: 10.1109/TNNLS.2020.2995920. Epub 2021 May 3.
6
Per-Round Knapsack-Constrained Linear Submodular Bandits.每轮背包约束线性次模博弈
Neural Comput. 2016 Dec;28(12):2757-2789. doi: 10.1162/NECO_a_00887. Epub 2016 Sep 14.
7
Maximum Entropy Exploration in Contextual Bandits with Neural Networks and Energy Based Models.基于神经网络和能量模型的上下文博弈中的最大熵探索
Entropy (Basel). 2023 Jan 18;25(2):188. doi: 10.3390/e25020188.
8
Overtaking method based on sand-sifter mechanism: Why do optimistic value functions find optimal solutions in multi-armed bandit problems?基于筛沙机制的超越方法:为何乐观值函数能在多臂老虎机问题中找到最优解?
Biosystems. 2015 Sep;135:55-65. doi: 10.1016/j.biosystems.2015.06.009. Epub 2015 Jul 10.
9
Control design and implementation of a novel master-slave surgery robot system, MicroHand A.新型主从式手术机器人系统 MicroHand A 的控制设计与实现
Int J Med Robot. 2011 Sep;7(3):334-47. doi: 10.1002/rcs.403. Epub 2011 Jul 5.
10
An Efficient Algorithm for Deep Stochastic Contextual Bandits.一种用于深度随机上下文博弈的高效算法。
Proc AAAI Conf Artif Intell. 2021 Feb;35(12):11193-11201.