论文
arXiv
Agent
UrbanTraffic
中文标题
R3DM:通过动力学模型在多智能体强化学习中实现角色发现与多样性
English Title
R3DM: Enabling Role Discovery and Diversity Through Dynamics Models in Multi-agent Reinforcement Learning
Harsh Goel, Mohammad Omama, Behdad Chalaki, Vaishnav Tadiparthi, Ehsan Moradi Pari, Sandeep Chinchali
发布时间
2025/5/30 14:40:19
来源类型
preprint
语言
en
摘要
中文对照

多智能体强化学习(MARL)已在大规模交通控制、自动驾驶车辆和机器人等领域取得显著进展。受生物系统启发——其中角色自然涌现以支持协作——基于角色的MARL方法被提出,以增强复杂任务中的协同学习能力。然而,现有方法仅从智能体训练过程中的过往经验中推导角色,忽略了角色对其未来轨迹的影响。本文提出一项关键洞见:智能体的角色应塑造其未来行为,从而实现有效协同。为此,我们提出了“通过动力学模型实现角色发现与多样性”(R3DM),一种新颖的基于角色的MARL框架;该框架通过最大化智能体角色、观测到的轨迹及预期未来行为之间的互信息来学习涌现式角色。R3DM首先利用对比学习对过往轨迹进行优化,推导出中间角色;这些中间角色用于构建内在奖励,再结合所学习的动力学模型,促进不同角色在未来行为上的多样性。在SMAC与SMACv2基准环境上的实验表明,R3DM优于当前最先进的MARL方法,将多智能体协同效果提升至最高20%的胜率。代码开源地址为:https://github.com/UTAustin-SwarmLab/R3DM。

English Original

Multi-agent reinforcement learning (MARL) has achieved significant progress in large-scale traffic control, autonomous vehicles, and robotics. Drawing inspiration from biological systems where roles naturally emerge to enable coordination, role-based MARL methods have been proposed to enhance cooperation learning for complex tasks. However, existing methods exclusively derive roles from an agent's past experience during training, neglecting their influence on its future trajectories. This paper introduces a key insight: an agent's role should shape its future behavior to enable effective coordination. Hence, we propose Role Discovery and Diversity through Dynamics Models (R3DM), a novel role-based MARL framework that learns emergent roles by maximizing the mutual information between agents' roles, observed trajectories, and expected future behaviors. R3DM optimizes the proposed objective through contrastive learning on past trajectories to first derive intermediate roles that shape intrinsic rewards to promote diversity in future behaviors across different roles through a learned dynamics model. Benchmarking on SMAC and SMACv2 environments demonstrates that R3DM outperforms state-of-the-art MARL approaches, improving multi-agent coordination to increase win rates by up to 20%. The code is available at https://github.com/UTAustin-SwarmLab/R3DM.

元数据
arXiv2505.24265v4
来源arXiv
类型论文
抽取状态raw
关键词
Agent
UrbanTraffic
cs.MA