• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于多臂赌博机的用户网络节点选择

Multi-Armed Bandit-Based User Network Node Selection.

作者信息

Gao Qinyan, Xie Zhidong

机构信息

National Innovation Institute of Defense Technology, Academy of Military Science, Beijing 100010, China.

Intelligent Game and Decision Laboratory, Academy of Military Science, Beijing 100091, China.

出版信息

Sensors (Basel). 2024 Jun 24;24(13):4104. doi: 10.3390/s24134104.

DOI:10.3390/s24134104
PMID:39000883
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11244562/
Abstract

In the scenario of an integrated space-air-ground emergency communication network, users encounter the challenge of rapidly identifying the optimal network node amidst the uncertainty and stochastic fluctuations of network states. This study introduces a Multi-Armed Bandit (MAB) model and proposes an optimization algorithm leveraging dynamic variance sampling (DVS). The algorithm posits that the prior distribution of each node's network state conforms to a normal distribution, and by constructing the distribution's expected value and variance, it maximizes the utilization of sample data, thereby maintaining an equilibrium between data exploitation and the exploration of the unknown. Theoretical substantiation is provided to illustrate that the Bayesian regret associated with the algorithm exhibits sublinear growth. Empirical simulations corroborate that the algorithm in question outperforms traditional ε-greedy, Upper Confidence Bound (UCB), and Thompson sampling algorithms in terms of higher cumulative rewards, diminished total regret, accelerated convergence rates, and enhanced system throughput.

摘要

在天地空一体化应急通信网络场景中,用户面临着在网络状态的不确定性和随机波动中快速识别最优网络节点的挑战。本研究引入了多臂赌博机(MAB)模型,并提出了一种利用动态方差采样(DVS)的优化算法。该算法假定每个节点网络状态的先验分布符合正态分布,通过构建该分布的期望值和方差,最大化样本数据的利用率,从而在数据利用和未知探索之间保持平衡。提供了理论证明来说明该算法的贝叶斯遗憾呈现次线性增长。实证模拟证实,该算法在累积奖励更高、总遗憾减少、收敛速度加快和系统吞吐量提高方面优于传统的ε-贪婪算法、上置信界(UCB)算法和汤普森采样算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b502/11244562/5aeae91cb322/sensors-24-04104-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b502/11244562/676e6f58675a/sensors-24-04104-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b502/11244562/bd86c97b67f5/sensors-24-04104-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b502/11244562/395e1a058122/sensors-24-04104-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b502/11244562/f4854d0e3e8b/sensors-24-04104-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b502/11244562/11d5ef0cadac/sensors-24-04104-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b502/11244562/5aeae91cb322/sensors-24-04104-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b502/11244562/676e6f58675a/sensors-24-04104-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b502/11244562/bd86c97b67f5/sensors-24-04104-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b502/11244562/395e1a058122/sensors-24-04104-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b502/11244562/f4854d0e3e8b/sensors-24-04104-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b502/11244562/11d5ef0cadac/sensors-24-04104-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b502/11244562/5aeae91cb322/sensors-24-04104-g006.jpg

相似文献

1
Multi-Armed Bandit-Based User Network Node Selection.基于多臂赌博机的用户网络节点选择
Sensors (Basel). 2024 Jun 24;24(13):4104. doi: 10.3390/s24134104.
2
Overtaking method based on sand-sifter mechanism: Why do optimistic value functions find optimal solutions in multi-armed bandit problems?基于筛沙机制的超越方法:为何乐观值函数能在多臂老虎机问题中找到最优解?
Biosystems. 2015 Sep;135:55-65. doi: 10.1016/j.biosystems.2015.06.009. Epub 2015 Jul 10.
3
Enhanced Dynamic Spectrum Access in UAV Wireless Networks for Post-Disaster Area Surveillance System: A Multi-Player Multi-Armed Bandit Approach.用于灾后区域监测系统的无人机无线网络中的增强型动态频谱接入:一种多方多人带臂赌博方法。
Sensors (Basel). 2021 Nov 25;21(23):7855. doi: 10.3390/s21237855.
4
Gateway Selection in Millimeter Wave UAV Wireless Networks Using Multi-Player Multi-Armed Bandit.基于多人多臂老虎机的毫米波无人机无线网络中的网关选择
Sensors (Basel). 2020 Jul 16;20(14):3947. doi: 10.3390/s20143947.
5
An empirical evaluation of active inference in multi-armed bandits.多臂赌博机中主动推理的实证评估。
Neural Netw. 2021 Dec;144:229-246. doi: 10.1016/j.neunet.2021.08.018. Epub 2021 Aug 26.
6
Optimism in the face of uncertainty supported by a statistically-designed multi-armed bandit algorithm.面对不确定性时的乐观态度由一种经过统计设计的多臂赌博机算法提供支持。
Biosystems. 2017 Oct;160:25-32. doi: 10.1016/j.biosystems.2017.08.004. Epub 2017 Aug 22.
7
A Multiplier Bootstrap Approach to Designing Robust Algorithms for Contextual Bandits.一种用于为情境博弈设计稳健算法的乘数自助法。
IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):9887-9899. doi: 10.1109/TNNLS.2022.3161806. Epub 2023 Nov 30.
8
Cascaded Algorithm Selection With Extreme-Region UCB Bandit.
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):6782-6794. doi: 10.1109/TPAMI.2021.3094844. Epub 2022 Sep 14.
9
A Thompson Sampling Algorithm With Logarithmic Regret for Unimodal Gaussian Bandit.一种针对单峰高斯博弈且具有对数遗憾值的汤普森采样算法。
IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):5332-5341. doi: 10.1109/TNNLS.2023.3295360. Epub 2023 Sep 1.
10
An Optimal Algorithm for the Stochastic Bandits While Knowing the Near-Optimal Mean Reward.已知最优平均回报的随机带臂赌博机的最优算法。
IEEE Trans Neural Netw Learn Syst. 2021 May;32(5):2285-2291. doi: 10.1109/TNNLS.2020.2995920. Epub 2021 May 3.

本文引用的文献

1
Electromagnetic Spectrum Allocation Method for Multi-Service Irregular Frequency-Using Devices in the Space-Air-Ground Integrated Network.空天地一体化网络中多业务不规则频率使用设备的电磁频谱分配方法。
Sensors (Basel). 2022 Nov 27;22(23):9227. doi: 10.3390/s22239227.