论文
arXiv
Trajectory
Mobility
LLM
UrbanTraffic
中文标题
DGLight:基于DQN引导的GRPO微调大型语言模型用于交通信号控制
English Title
DGLight: DQN-Guided GRPO Fine-Tuning of Large Language Models for Traffic Signal Control
Chenbo Yu
发布时间
2026/4/28 14:09:01
来源类型
preprint
语言
en
摘要
中文对照

交通信号控制(TSC)在缓解拥堵和维持城市交通流动性方面发挥着核心作用。本论文提出DGLight,一种基于批评器引导的强化学习框架,用于将预训练大型语言模型适配至TSC任务。DGLight首先训练一个基于CoLight的深度Q网络(DQN)批评器,以从结构化的路口状态中估计交通感知的动作值;随后,该冻结的批评器被用于对候选语言模型动作进行打分,并利用组相对策略优化(GRPO)更新策略。所得到的控制器能够将交通状态映射为可解释的推理轨迹与信号决策,且其学习过程依赖于稠密的逐状态监督信号,而非原始的累积环境奖励。在涵盖济南与杭州的TSC基准测试中,实验表明DGLight是所比较的各类基于大语言模型(LLM)的控制器中整体性能最强的方法,同时在强基线强化学习方法中仍具竞争力,并能良好迁移到未参与批评器训练的城市数据集上。定性案例进一步表明,模型生成的推理过程具有可解释性,且与其所选择的信号相位保持一致。项目代码见 $\href{https://github.com/yyccbb/FYP_LLMTSC}{here}$。

English Original

Traffic signal control (TSC) plays a central role in reducing congestion and maintaining urban mobility. This dissertation introduces DGLight, a critic-guided reinforcement-learning framework for adapting a pretrained large language model to TSC. DGLight first trains a CoLight-based Deep Q-Network critic to estimate traffic-aware action values from structured intersection states, then uses the frozen critic to score candidate language-model actions and optimize the policy with Group Relative Policy Optimization (GRPO). The resulting controller maps traffic states to interpretable reasoning traces and signal decisions while learning from dense per-state supervision rather than raw cumulative environment rewards. Experiments on TSC benchmarks covering Jinan and Hangzhou show that DGLight is the strongest overall method among the compared LLM-based controllers, remains competitive with strong RL baselines, and transfers well to city datasets not used to fit the critic. Qualitative examples further show that the model's generated reasoning is interpretable and aligned with the chosen signal phase. The project code is available $\href{https://github.com/yyccbb/FYP_LLMTSC}{here}$.

元数据
arXiv2604.25259v1
来源arXiv
类型论文
抽取状态raw
关键词
Trajectory
Mobility
LLM
UrbanTraffic
cs.LG