UrbanComp Lab | 学习资料库

返回论文库

论文

arXiv

Agent

UrbanTraffic

中文标题

R3DM：通过动力学模型在多智能体强化学习中实现角色发现与多样性

English Title

R3DM: Enabling Role Discovery and Diversity Through Dynamics Models in Multi-agent Reinforcement Learning

Harsh Goel, Mohammad Omama, Behdad Chalaki, Vaishnav Tadiparthi, Ehsan Moradi Pari, Sandeep Chinchali

发布时间

2025/5/30 14:40:19

来源类型

preprint

语言

摘要

中文对照

多智能体强化学习（MARL）已在大规模交通控制、自动驾驶车辆和机器人等领域取得显著进展。受生物系统启发——其中角色自然涌现以支持协作——基于角色的MARL方法被提出，以增强复杂任务中的协同学习能力。然而，现有方法仅从智能体训练过程中的过往经验中推导角色，忽略了角色对其未来轨迹的影响。本文提出一项关键洞见：智能体的角色应塑造其未来行为，从而实现有效协同。为此，我们提出了“通过动力学模型实现角色发现与多样性”（R3DM），一种新颖的基于角色的MARL框架；该框架通过最大化智能体角色、观测到的轨迹及预期未来行为之间的互信息来学习涌现式角色。R3DM首先利用对比学习对过往轨迹进行优化，推导出中间角色；这些中间角色用于构建内在奖励，再结合所学习的动力学模型，促进不同角色在未来行为上的多样性。在SMAC与SMACv2基准环境上的实验表明，R3DM优于当前最先进的MARL方法，将多智能体协同效果提升至最高20%的胜率。代码开源地址为：https://github.com/UTAustin-SwarmLab/R3DM。

English Original

Multi-agent reinforcement learning (MARL) has achieved significant progress in large-scale traffic control, autonomous vehicles, and robotics. Drawing inspiration from biological systems where roles naturally emerge to enable coordination, role-based MARL methods have been proposed to enhance cooperation learning for complex tasks. However, existing methods exclusively derive roles from an agent's past experience during training, neglecting their influence on its future trajectories. This paper introduces a key insight: an agent's role should shape its future behavior to enable effective coordination. Hence, we propose Role Discovery and Diversity through Dynamics Models (R3DM), a novel role-based MARL framework that learns emergent roles by maximizing the mutual information between agents' roles, observed trajectories, and expected future behaviors. R3DM optimizes the proposed objective through contrastive learning on past trajectories to first derive intermediate roles that shape intrinsic rewards to promote diversity in future behaviors across different roles through a learned dynamics model. Benchmarking on SMAC and SMACv2 environments demonstrates that R3DM outperforms state-of-the-art MARL approaches, improving multi-agent coordination to increase win rates by up to 20%. The code is available at https://github.com/UTAustin-SwarmLab/R3DM.

资源链接

论文 PDFarxiv.org/pdf/2505.24265v4 论文 PDFarxiv.org/pdf/2505.24265v4 原始来源页面arxiv.org/abs/2505.24265v4

元数据

arXiv2505.24265v4

来源arXiv

类型论文

抽取状态raw

关键词

Agent

UrbanTraffic

cs.MA