论文
arXiv
SpatialIntelligence
Trajectory
Mobility
Agent
UrbanTraffic
中文标题
基于不确定性感知的共形预测与世界模型强化学习的安全城市交通控制
English Title
Safe Urban Traffic Control via Uncertainty-Aware Conformal Prediction and World-Model Reinforcement Learning
Joydeep Chandra, Satyam Kumar Navneet, Aleksandr Algazinov, Yong Zhang
发布时间
2026/2/5 02:10:59
来源类型
preprint
语言
en
摘要
中文对照

城市交通管理需要能够同时预测未来状态、检测异常并采取安全校正措施的系统,且需提供可靠性保证。本文提出 STREAM-RL,一种统一框架,包含三项新算法贡献:(1)PU-GAT+,一种不确定性引导的自适应共形预测器,利用预测不确定性通过置信度单调注意力机制动态重加权图注意力,实现分布无关的覆盖保证;(2)CRFN-BY,一种共形残差流网络,借助归一化流建模不确定性归一化的残差,并在任意依赖结构下采用 Benjamini-Yekutieli 方法控制错误发现率(FDR);(3)LyCon-WRL+,一种不确定性引导的安全世界模型强化学习智能体,具备李雅普诺夫稳定性证书、经认证的 Lipschitz 界限,以及不确定性传播的想象 rollout。据我们所知,这是首个从预测、经异常检测、至安全策略学习全程传播经校准不确定性并提供端到端理论保证的框架。在多个真实城市交通轨迹数据集上的实验表明,STREAM-RL 实现了 91.4% 的覆盖效率,在已验证依赖结构下将 FDR 控制在 4.1%,安全率提升至 95.2%(标准 PPO 为 69%),同时获得更高奖励,端到端推理延迟为 23 毫秒。

English Original

Urban traffic management demands systems that simultaneously predict future conditions, detect anomalies, and take safe corrective actions -- all while providing reliability guarantees. We present STREAM-RL, a unified framework that introduces three novel algorithmic contributions: (1) PU-GAT+, an Uncertainty-Guided Adaptive Conformal Forecaster that uses prediction uncertainty to dynamically reweight graph attention via confidence-monotonic attention, achieving distribution-free coverage guarantees; (2) CRFN-BY, a Conformal Residual Flow Network that models uncertainty-normalized residuals via normalizing flows with Benjamini-Yekutieli FDR control under arbitrary dependence; and (3) LyCon-WRL+, an Uncertainty-Guided Safe World-Model RL agent with Lyapunov stability certificates, certified Lipschitz bounds, and uncertainty-propagated imagination rollouts. To our knowledge, this is the first framework to propagate calibrated uncertainty from forecasting through anomaly detection to safe policy learning with end-to-end theoretical guarantees. Experiments on multiple real-world traffic trajectory data demonstrate that STREAM-RL achieves 91.4\% coverage efficiency, controls FDR at 4.1\% under verified dependence, and improves safety rate to 95.2\% compared to 69\% for standard PPO while achieving higher reward, with 23ms end-to-end inference latency.

元数据
arXiv2602.04821v1
来源arXiv
类型论文
抽取状态raw
关键词
SpatialIntelligence
Trajectory
Mobility
Agent
UrbanTraffic
cs.LG
cs.AI