Yuan Lei, Li Lihe, Zhang Ziqian, Zhang Fuxiang, Guan Cong, Yu Yang
IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6326-6340. doi: 10.1109/TNNLS.2024.3394513. Epub 2025 Apr 4.
Cooperative multiagent reinforcement learning (MARL) has attracted significant attention and has the potential for many real-world applications. Previous arts mainly focus on facilitating the coordination ability from different aspects (e.g., nonstationarity and credit assignment) in single-task or multitask scenarios, ignoring the stream of tasks that appear in a continual manner. This ignorance makes the continual coordination an unexplored territory, neither in problem formulation nor efficient algorithms designed. Toward tackling the mentioned issue, this article proposes an approach, multiagent continual coordination via progressive task contextualization (MACPro). The key point lies in obtaining a factorized policy, using shared feature extraction layers but separated independent task heads, each specializing in a specific class of tasks. The task heads can be progressively expanded based on the learned task contextualization. Moreover, to cater to the popular centralized training with decentralized execution (CTDE) paradigm in MARL, each agent learns to predict and adopt the most relevant policy head based on local information in a decentralized manner. We show in multiple multiagent benchmarks that existing continual learning methods fail, while MACPro is able to achieve close-to-optimal performance. More results also disclose the effectiveness of MACPro from multiple aspects, such as high generalization ability.
合作多智能体强化学习(MARL)已引起广泛关注,并在许多实际应用中具有潜力。先前的方法主要集中在单任务或多任务场景中从不同方面(例如非平稳性和信用分配)促进协调能力,而忽略了以连续方式出现的任务流。这种忽视使得连续协调成为一个未被探索的领域,无论是在问题表述还是高效算法设计方面。为了解决上述问题,本文提出了一种方法,即通过渐进式任务情境化实现多智能体连续协调(MACPro)。关键在于获得一个分解策略,使用共享特征提取层但分离独立的任务头,每个任务头专门处理特定类别的任务。任务头可以根据学习到的任务情境化进行渐进式扩展。此外,为了适应MARL中流行的集中训练与分散执行(CTDE)范式,每个智能体学会以分散方式根据本地信息预测并采用最相关的策略头。我们在多个多智能体基准测试中表明,现有的持续学习方法失败了,而MACPro能够实现接近最优的性能。更多结果还从多个方面揭示了MACPro的有效性,例如高泛化能力。