UrbanComp Lab | 学习资料库

返回论文库

论文

arXiv

Trajectory

Mobility

LLM

UrbanTraffic

中文标题

DGLight：基于DQN引导的GRPO微调大型语言模型用于交通信号控制

English Title

DGLight: DQN-Guided GRPO Fine-Tuning of Large Language Models for Traffic Signal Control

Chenbo Yu

发布时间

2026/4/28 14:09:01

来源类型

preprint

语言

摘要

中文对照

交通信号控制（TSC）在缓解拥堵和维持城市交通流动性方面发挥着核心作用。本论文提出DGLight，一种基于批评器引导的强化学习框架，用于将预训练大型语言模型适配至TSC任务。DGLight首先训练一个基于CoLight的深度Q网络（DQN）批评器，以从结构化的路口状态中估计交通感知的动作值；随后，该冻结的批评器被用于对候选语言模型动作进行打分，并利用组相对策略优化（GRPO）更新策略。所得到的控制器能够将交通状态映射为可解释的推理轨迹与信号决策，且其学习过程依赖于稠密的逐状态监督信号，而非原始的累积环境奖励。在涵盖济南与杭州的TSC基准测试中，实验表明DGLight是所比较的各类基于大语言模型（LLM）的控制器中整体性能最强的方法，同时在强基线强化学习方法中仍具竞争力，并能良好迁移到未参与批评器训练的城市数据集上。定性案例进一步表明，模型生成的推理过程具有可解释性，且与其所选择的信号相位保持一致。项目代码见 $\href{https://github.com/yyccbb/FYP_LLMTSC}{here}$。

English Original

Traffic signal control (TSC) plays a central role in reducing congestion and maintaining urban mobility. This dissertation introduces DGLight, a critic-guided reinforcement-learning framework for adapting a pretrained large language model to TSC. DGLight first trains a CoLight-based Deep Q-Network critic to estimate traffic-aware action values from structured intersection states, then uses the frozen critic to score candidate language-model actions and optimize the policy with Group Relative Policy Optimization (GRPO). The resulting controller maps traffic states to interpretable reasoning traces and signal decisions while learning from dense per-state supervision rather than raw cumulative environment rewards. Experiments on TSC benchmarks covering Jinan and Hangzhou show that DGLight is the strongest overall method among the compared LLM-based controllers, remains competitive with strong RL baselines, and transfers well to city datasets not used to fit the critic. Qualitative examples further show that the model's generated reasoning is interpretable and aligned with the chosen signal phase. The project code is available $\href{https://github.com/yyccbb/FYP_LLMTSC}{here}$.

资源链接

论文 PDFarxiv.org/pdf/2604.25259v1 论文 PDFarxiv.org/pdf/2604.25259v1 原始来源页面arxiv.org/abs/2604.25259v1

元数据

arXiv2604.25259v1

来源arXiv

类型论文

抽取状态raw

关键词

Trajectory

Mobility

LLM

UrbanTraffic

cs.LG