UrbanComp Lab | 学习资料库

返回论文库

论文

arXiv

SpatialIntelligence

Trajectory

Mobility

LLM

Multimodal

中文标题

Orion-Lite：将大语言模型推理能力蒸馏为高效的纯视觉驾驶模型

English Title

Orion-Lite: Distilling LLM Reasoning into Efficient Vision-Only Driving Models

Jing Gu, Niccolò Cavagnero, Gijs Dubbelman

发布时间

2026/4/9 21:51:55

来源类型

preprint

语言

摘要

中文对照

利用大语言模型（LLM）所具备的通用世界知识，有望显著提升自动驾驶系统应对罕见与复杂场景的能力。尽管将LLM集成至视觉-语言-动作（VLA）模型已取得当前最优性能，但其庞大的参数量给对延迟敏感且需能效优化的部署带来了严峻挑战。将LLM知识蒸馏至轻量级驾驶模型，为在保持可管理计算开销的同时保留此类推理能力提供了一种极具吸引力的解决方案。尽管先前工作已验证了知识蒸馏的有效性，但这些研究主要集中于相对简单的场景及开环评估。因此，本文在更复杂、交互式的场景下，采用闭环评估范式开展LLM知识蒸馏研究。我们证明，通过结合隐空间特征蒸馏与真值轨迹监督，一个高效的纯视觉学生模型\textbf{Orion-Lite}甚至可超越其庞大的VLA教师模型ORION的性能。该模型在严苛的Bench2Drive基准测试中创下新的当前最优结果，驾驶得分（Driving Score）达80.6。最终结果表明，纯视觉架构在高性能反应式规划任务中仍蕴藏着巨大且尚未被充分挖掘的潜力。

English Original

Leveraging the general world knowledge of Large Language Models (LLMs) holds significant promise for improving the ability of autonomous driving systems to handle rare and complex scenarios. While integrating LLMs into Vision-Language-Action (VLA) models has yielded state-of-the-art performance, their massive parameter counts pose severe challenges for latency-sensitive and energy-efficient deployment. Distilling LLM knowledge into a compact driving model offers a compelling solution to retain these reasoning capabilities while maintaining a manageable computational footprint. Although previous works have demonstrated the efficacy of distillation, these efforts have primarily focused on relatively simple scenarios and open-loop evaluations. Therefore, in this work, we investigate LLM distillation in more complex, interactive scenarios under closed-loop evaluation. We demonstrate that through a combination of latent feature distillation and ground-truth trajectory supervision, an efficient vision-only student model \textbf{Orion-Lite} can even surpass the performance of its massive VLA teacher, ORION. Setting a new state-of-the-art on the rigorous Bench2Drive benchmark, with a Driving Score of 80.6. Ultimately, this reveals that vision-only architectures still possess significant, untapped potential for high-performance reactive planning.

资源链接

论文 PDFarxiv.org/pdf/2604.08266v1 论文 PDFarxiv.org/pdf/2604.08266v1 原始来源页面arxiv.org/abs/2604.08266v1

元数据

arXiv2604.08266v1

来源arXiv

类型论文

抽取状态raw

关键词

SpatialIntelligence

Trajectory

Mobility

LLM

Multimodal

cs.CV