• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多智能体环境中的自组织神经架构与协同学习

Self-organizing neural architectures and cooperative learning in a multiagent environment.

作者信息

Xiao Dan, Tan Ah-Hwee

机构信息

School of Computer Engineering, Nanyang Technological University, Singapore 639798, Singapore.

出版信息

IEEE Trans Syst Man Cybern B Cybern. 2007 Dec;37(6):1567-80. doi: 10.1109/tsmcb.2007.907040.

DOI:10.1109/tsmcb.2007.907040
PMID:18179074
Abstract

Temporal-Difference-Fusion Architecture for Learning, Cognition, and Navigation (TD-FALCON) is a generalization of adaptive resonance theory (a class of self-organizing neural networks) that incorporates TD methods for real-time reinforcement learning. In this paper, we investigate how a team of TD-FALCON networks may cooperate to learn and function in a dynamic multiagent environment based on minefield navigation and a predator/prey pursuit tasks. Experiments on the navigation task demonstrate that TD-FALCON agent teams are able to adapt and function well in a multiagent environment without an explicit mechanism of collaboration. In comparison, traditional Q-learning agents using gradient-descent-based feedforward neural networks, trained with the standard backpropagation and the resilient-propagation (RPROP) algorithms, produce a significantly poorer level of performance. For the predator/prey pursuit task, we experiment with various cooperative strategies and find that a combination of a high-level compressed state representation and a hybrid reward function produces the best results. Using the same cooperative strategy, the TD-FALCON team also outperforms the RPROP-based reinforcement learners in terms of both task completion rate and learning efficiency.

摘要

用于学习、认知和导航的时间差分融合架构(TD-FALCON)是自适应共振理论(一类自组织神经网络)的推广,它结合了用于实时强化学习的时间差分方法。在本文中,我们研究了一组TD-FALCON网络如何基于雷场导航和捕食者/猎物追捕任务在动态多智能体环境中进行协作学习和运行。在导航任务上的实验表明,TD-FALCON智能体团队能够在没有明确协作机制的多智能体环境中自适应并良好运行。相比之下,使用基于梯度下降的前馈神经网络、通过标准反向传播和弹性传播(RPROP)算法训练的传统Q学习智能体,其性能水平要差得多。对于捕食者/猎物追捕任务,我们试验了各种协作策略,发现高级压缩状态表示和混合奖励函数的组合产生了最佳结果。使用相同的协作策略,TD-FALCON团队在任务完成率和学习效率方面也优于基于RPROP的强化学习者。

相似文献

1
Self-organizing neural architectures and cooperative learning in a multiagent environment.多智能体环境中的自组织神经架构与协同学习
IEEE Trans Syst Man Cybern B Cybern. 2007 Dec;37(6):1567-80. doi: 10.1109/tsmcb.2007.907040.
2
Integrating temporal difference methods and self-organizing neural networks for reinforcement learning with delayed evaluative feedback.将时间差分方法与自组织神经网络相结合用于具有延迟评估反馈的强化学习。
IEEE Trans Neural Netw. 2008 Feb;19(2):230-44. doi: 10.1109/TNN.2007.905839.
3
Self-organizing neural networks integrating domain knowledge and reinforcement learning.自组织神经网络集成领域知识和强化学习。
IEEE Trans Neural Netw Learn Syst. 2015 May;26(5):889-902. doi: 10.1109/TNNLS.2014.2327636.
4
Dynamic extreme learning machine and its approximation capability.动态极限学习机及其逼近能力。
IEEE Trans Cybern. 2013 Dec;43(6):2054-65. doi: 10.1109/TCYB.2013.2239987.
5
A study on expertise of agents and its effects on cooperative Q-learning.关于智能体的专业知识及其对合作式Q学习影响的研究
IEEE Trans Syst Man Cybern B Cybern. 2007 Apr;37(2):398-409. doi: 10.1109/tsmcb.2006.883264.
6
Magnified gradient function with deterministic weight modification in adaptive learning.自适应学习中具有确定性权重修改的放大梯度函数
IEEE Trans Neural Netw. 2004 Nov;15(6):1411-23. doi: 10.1109/TNN.2004.836237.
7
Fuzzy OLAP association rules mining-based modular reinforcement learning approach for multiagent systems.基于模糊OLAP关联规则挖掘的多智能体系统模块化强化学习方法
IEEE Trans Syst Man Cybern B Cybern. 2005 Apr;35(2):326-38. doi: 10.1109/tsmcb.2004.843278.
8
Decision manifolds--a supervised learning algorithm based on self-organization.决策流形——一种基于自组织的监督学习算法。
IEEE Trans Neural Netw. 2008 Sep;19(9):1518-30. doi: 10.1109/TNN.2008.2000449.
9
Generalization characteristics of complex-valued feedforward neural networks in relation to signal coherence.复值前馈神经网络与信号相干性的泛化特性。
IEEE Trans Neural Netw Learn Syst. 2012 Apr;23(4):541-51. doi: 10.1109/TNNLS.2012.2183613.
10
Efficient training algorithms for a class of shunting inhibitory convolutional neural networks.一类分流抑制卷积神经网络的高效训练算法
IEEE Trans Neural Netw. 2005 May;16(3):541-56. doi: 10.1109/TNN.2005.845144.