• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于强化学习的复杂网络上连续策略博弈的最优进化策略

Optimal Evolution Strategy for Continuous Strategy Games on Complex Networks via Reinforcement Learning.

作者信息

Fan Litong, Yu Dengxiu, Hao Cheong Kang, Wang Zhen

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 Jul;36(7):12827-12839. doi: 10.1109/TNNLS.2024.3453385.

DOI:10.1109/TNNLS.2024.3453385
PMID:39302801
Abstract

This article presents an optimal evolution strategy for continuous strategy games on complex networks via reinforcement learning (RL). In the past, evolutionary game theory usually assumed that agents use the same selection intensity when interacting, ignoring the differences in their learning abilities and learning willingness. Individuals are reluctant to change their strategies too much. Therefore, we design an adaptive strategy updating framework with various selection intensities for continuous strategy games on complex networks based on imitation dynamics, allowing agents to achieve the optimal state and a higher cooperation level with the minimal strategy changes. The optimal updating strategy is acquired using a coupled Hamilton-Jacobi-Bellman (HJB) equation by minimizing the performance function. This function aims to maximize individual payoffs while minimizing strategy changes. Furthermore, a value iteration (VI) RL algorithm is proposed to approximate the HJB solutions and learn the optimal strategy updating rules. The RL algorithm employs actor and critic neural networks to approximate strategy changes and performance functions, along with the gradient descent weight update approach. Meanwhile, the stability and convergence of the proposed methods have been proved by the designed Lyapunov function. Simulations validate the convergence and effectiveness of the proposed methods in different games and complex networks.

摘要

本文提出了一种基于强化学习(RL)的复杂网络上连续策略博弈的最优进化策略。过去,进化博弈理论通常假设参与者在交互时使用相同的选择强度,而忽略了它们学习能力和学习意愿的差异。个体不太愿意过多改变自己的策略。因此,我们基于模仿动力学为复杂网络上的连续策略博弈设计了一个具有各种选择强度的自适应策略更新框架,使参与者能够以最小的策略变化达到最优状态并实现更高的合作水平。通过最小化性能函数,利用耦合汉密尔顿 - 雅可比 - 贝尔曼(HJB)方程获得最优更新策略。该函数旨在最大化个体收益,同时最小化策略变化。此外,还提出了一种值迭代(VI)RL算法来逼近HJB解并学习最优策略更新规则。该RL算法采用演员和评论家神经网络来逼近策略变化和性能函数,并采用梯度下降权重更新方法。同时,通过设计李雅普诺夫函数证明了所提方法的稳定性和收敛性。仿真验证了所提方法在不同博弈和复杂网络中的收敛性和有效性。

相似文献

1
Optimal Evolution Strategy for Continuous Strategy Games on Complex Networks via Reinforcement Learning.基于强化学习的复杂网络上连续策略博弈的最优进化策略
IEEE Trans Neural Netw Learn Syst. 2025 Jul;36(7):12827-12839. doi: 10.1109/TNNLS.2024.3453385.
2
Adaptive Optimal Surrounding Control of Multiple Unmanned Surface Vessels via Actor-Critic Reinforcement Learning.基于智能体-评论家强化学习的多艘无人水面舰艇自适应最优周边控制
IEEE Trans Neural Netw Learn Syst. 2024 Oct 18;PP. doi: 10.1109/TNNLS.2024.3474289.
3
Quality improvement strategies for diabetes care: Effects on outcomes for adults living with diabetes.糖尿病护理质量改进策略:对成年糖尿病患者结局的影响。
Cochrane Database Syst Rev. 2023 May 31;5(5):CD014513. doi: 10.1002/14651858.CD014513.
4
The use of Open Dialogue in Trauma Informed Care services for mental health consumers and their family networks: A scoping review.创伤知情护理服务中使用开放对话模式为心理健康消费者及其家庭网络提供服务:范围综述。
J Psychiatr Ment Health Nurs. 2024 Aug;31(4):681-698. doi: 10.1111/jpm.13023. Epub 2024 Jan 17.
5
Management of urinary stones by experts in stone disease (ESD 2025).结石病专家对尿路结石的管理(2025年结石病专家共识)
Arch Ital Urol Androl. 2025 Jun 30;97(2):14085. doi: 10.4081/aiua.2025.14085.
6
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
7
Health professionals' experience of teamwork education in acute hospital settings: a systematic review of qualitative literature.医疗专业人员在急症医院环境中团队合作教育的经验:对定性文献的系统综述
JBI Database System Rev Implement Rep. 2016 Apr;14(4):96-137. doi: 10.11124/JBISRIR-2016-1843.
8
The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》
Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.
9
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
10
Technological aids for the rehabilitation of memory and executive functioning in children and adolescents with acquired brain injury.脑损伤儿童和青少年记忆与执行功能康复的技术辅助手段。
Cochrane Database Syst Rev. 2016 Jul 1;7(7):CD011020. doi: 10.1002/14651858.CD011020.pub2.

引用本文的文献

1
Reverse game: from Nash equilibrium to network structure, number and probability of occurrence.反向博弈:从纳什均衡到网络结构、数量及出现概率
R Soc Open Sci. 2025 May 21;12(5):241928. doi: 10.1098/rsos.241928. eCollection 2025 May.