UrbanComp Lab | 学习资料库

返回论文库

论文

arXiv

GeoAI

GIS

SpatialIntelligence

Trajectory

Mobility

LLM

GeoLargeModel

GeoFoundationModel

Multimodal

GeoMultimodal

中文标题

通过 DFR-Gemma 实现对稠密地理空间嵌入的内在推理

English Title

Enabling Intrinsic Reasoning over Dense Geospatial Embeddings with DFR-Gemma

Xuechen Zhang, Aviv Slobodkin, Joydeep Paul, Mandar Sharma, Samet Oymak, Shravya Shetty, Gautam Prasad

发布时间

2026/4/9 02:31:38

来源类型

preprint

语言

摘要

中文对照

地理空间与时空数据的表征学习在构建通用地理空间智能中起着关键作用。近期的地理空间基础模型（如人口动力学基础模型 PDFM）将复杂的人口与移动性动态编码为紧凑的嵌入表示。然而，此类嵌入与大语言模型（LLM）的集成仍十分有限。现有 LLM 集成方法将这些嵌入视为检索索引，或将其转换为文本描述以支持推理，由此引入冗余、令牌效率低下及数值失真等问题。我们提出直接特征推理-Gemma（DFR-Gemma），一种新型框架，使 LLM 能够直接在稠密地理空间嵌入上进行推理。DFR 通过一个轻量级投影器将高维嵌入对齐至 LLM 的潜在空间，从而允许嵌入作为语义令牌与自然语言指令一同注入。该设计消除了对中间文本表示的依赖，并支持对空间特征的内在推理。为评估该范式，我们构建了一个多任务地理空间基准，将嵌入与多种问答任务配对，包括特征查询、比较及语义描述。实验结果表明，DFR 使 LLM 能够解码潜在空间模式，并在各项任务中实现准确的零样本推理，同时相较基于文本的基线显著提升效率。我们的结果表明，将嵌入视作首要数据输入，为多模态地理空间智能提供了一种更直接、更高效且更具可扩展性的路径。

English Original

Representation learning for geospatial and spatio-temporal data plays a critical role in enabling general-purpose geospatial intelligence. Recent geospatial foundation models, such as the Population Dynamics Foundation Model (PDFM), encode complex population and mobility dynamics into compact embeddings. However, their integration with Large Language Models (LLMs) remains limited. Existing approaches to LLM integration treat these embeddings as retrieval indices or convert them into textual descriptions for reasoning, introducing redundancy, token inefficiency, and numerical inaccuracies. We propose Direct Feature Reasoning-Gemma (DFR-Gemma), a novel framework that enables LLMs to reason directly over dense geospatial embeddings. DFR aligns high-dimensional embeddings with the latent space of an LLM via a lightweight projector, allowing embeddings to be injected as semantic tokens alongside natural language instructions. This design eliminates the need for intermediate textual representations and enables intrinsic reasoning over spatial features. To evaluate this paradigm, we introduce a multi-task geospatial benchmark that pairs embeddings with diverse question-answer tasks, including feature querying, comparison, and semantic description. Experimental results show that DFR allows LLMs to decode latent spatial patterns and perform accurate zero-shot reasoning across tasks, while significantly improving efficiency compared to text-based baselines. Our results demonstrate that treating embeddings as primary data inputs, provides a more direct, efficient, and scalable approach to multimodal geospatial intelligence.

资源链接

论文 PDFarxiv.org/pdf/2604.07490v1 论文 PDFarxiv.org/pdf/2604.07490v1 原始来源页面arxiv.org/abs/2604.07490v1

元数据

arXiv2604.07490v1

来源arXiv

类型论文

抽取状态raw

关键词