Liu Feng, Li Dongqi, Gao Jian
School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an, China.
Kunming Precision Machinery Research Institute, Kunming, China.
Front Neurorobot. 2024 May 7;18:1364587. doi: 10.3389/fnbot.2024.1364587. eCollection 2024.
Multiagent Reinforcement Learning (MARL) has been well adopted due to its exceptional ability to solve multiagent decision-making problems. To further enhance learning efficiency, knowledge transfer algorithms have been developed, among which experience-sharing-based and action-advising-based transfer strategies share the mainstream. However, it is notable that, although there exist many successful applications of both strategies, they are not flawless. For the long-developed action-advising-based methods (namely KT-AA, short for knowledge transfer based on action advising), their data efficiency and scalability are not satisfactory. As for the newly proposed experience-sharing-based knowledge transfer methods (KT-ES), although the shortcomings of KT-AA have been partially overcome, they are incompetent to correct specific bad decisions in the later learning stage. To leverage the superiority of both KT-AA and KT-ES, this study proposes KT-Hybrid, a hybrid knowledge transfer approach. In the early learning phase, KT-ES methods are employed, expecting better data efficiency from KT-ES to enhance the policy to a basic level as soon as possible. Later, we focus on correcting specific errors made by the basic policy, trying to use KT-AA methods to further improve the performance. Simulations demonstrate that the proposed KT-Hybrid outperforms well-received action-advising- and experience-sharing-based methods.
多智能体强化学习(MARL)因其解决多智能体决策问题的卓越能力而得到广泛应用。为了进一步提高学习效率,人们开发了知识转移算法,其中基于经验共享和基于行动建议的转移策略占据主流。然而,值得注意的是,尽管这两种策略都有许多成功的应用,但它们并非完美无缺。对于长期发展的基于行动建议的方法(即KT-AA,基于行动建议的知识转移的简称),其数据效率和可扩展性并不令人满意。至于新提出的基于经验共享的知识转移方法(KT-ES),虽然部分克服了KT-AA的缺点,但它们在后期学习阶段无法纠正特定的错误决策。为了利用KT-AA和KT-ES的优势,本研究提出了KT-Hybrid,一种混合知识转移方法。在早期学习阶段,采用KT-ES方法,期望从KT-ES获得更好的数据效率,以便尽快将策略提升到基本水平。后期,我们专注于纠正基本策略所犯的特定错误,尝试使用KT-AA方法进一步提高性能。仿真结果表明,所提出的KT-Hybrid优于广受欢迎的基于行动建议和基于经验共享的方法。