UrbanComp Lab | 学习资料库

返回论文库

论文

arXiv

Trajectory

Mobility

LLM

Agent

中文标题

MobilityBench：面向真实世界出行场景的路径规划智能体评测基准

English Title

MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios

Zhiheng Song, Jingshuai Zhang, Chuan Qin, Chao Wang, Chao Chen, Longfei Xu, Kaikui Liu, Xiangxiang Chu, Hengshu Zhu

发布时间

2026/2/26 13:39:38

来源类型

preprint

语言

摘要

中文对照

基于大语言模型（LLM）的路径规划智能体作为一种新兴范式，正通过自然语言交互与工具辅助决策支持日常人类出行。然而，真实世界出行场景下的系统性评测仍受限于多样化的路径规划需求、非确定性的地图服务接口以及可复现性不足等问题。本研究提出 MobilityBench——一个面向真实世界出行场景、可扩展的 LLM 路径规划智能体评测基准。MobilityBench 数据源自高德地图（Amap）采集的大规模匿名真实用户查询，覆盖全球多个城市的广泛路径规划意图。为实现可复现的端到端评测，我们设计了一种确定性的 API 回放沙箱环境，消除了实时服务引入的环境不确定性。此外，我们提出一种以结果有效性为核心、辅以指令理解、规划能力、工具调用与执行效率等多维度的综合评测协议。借助 MobilityBench，我们在多种真实世界出行场景下评测了多个基于 LLM 的路径规划智能体，并对其行为与性能进行了深入分析。结果表明，当前模型在基础信息检索与常规路径规划任务上表现良好，但在偏好约束型路径规划任务中显著受限，凸显个性化出行应用仍有较大提升空间。我们已将该基准数据集、评测工具包及完整文档公开发布于 https://github.com/AMAP-ML/MobilityBench。

English Original

Route-planning agents powered by large language models (LLMs) have emerged as a promising paradigm for supporting everyday human mobility through natural language interaction and tool-mediated decision making. However, systematic evaluation in real-world mobility settings is hindered by diverse routing demands, non-deterministic mapping services, and limited reproducibility. In this study, we introduce MobilityBench, a scalable benchmark for evaluating LLM-based route-planning agents in real-world mobility scenarios. MobilityBench is constructed from large-scale, anonymized real user queries collected from Amap and covers a broad spectrum of route-planning intents across multiple cities worldwide. To enable reproducible, end-to-end evaluation, we design a deterministic API-replay sandbox that eliminates environmental variance from live services. We further propose a multi-dimensional evaluation protocol centered on outcome validity, complemented by assessments of instruction understanding, planning, tool use, and efficiency. Using MobilityBench, we evaluate multiple LLM-based route-planning agents across diverse real-world mobility scenarios and provide an in-depth analysis of their behaviors and performance. Our findings reveal that current models perform competently on Basic information retrieval and Route Planning tasks, yet struggle considerably with Preference-Constrained Route Planning, underscoring significant room for improvement in personalized mobility applications. We publicly release the benchmark data, evaluation toolkit, and documentation at https://github.com/AMAP-ML/MobilityBench.

资源链接

论文 PDFarxiv.org/pdf/2602.22638v2 论文 PDFarxiv.org/pdf/2602.22638v2 原始来源页面arxiv.org/abs/2602.22638v2

元数据

arXiv2602.22638v2

来源arXiv

类型论文

抽取状态raw

关键词

Trajectory

Mobility

LLM

Agent

cs.AI