地理空间与时空数据的表征学习在构建通用地理空间智能中起着关键作用。近期的地理空间基础模型(如人口动力学基础模型 PDFM)将复杂的人口与移动性动态编码为紧凑的嵌入表示。然而,此类嵌入与大语言模型(LLM)的集成仍十分有限。现有 LLM 集成方法将这些嵌入视为检索索引,或将其转换为文本描述以支持推理,由此引入冗余、令牌效率低下及数值失真等问题。我们提出直接特征推理-Gemma(DFR-Gemma),一种新型框架,使 LLM 能够直接在稠密地理空间嵌入上进行推理。DFR 通过一个轻量级投影器将高维嵌入对齐至 LLM 的潜在空间,从而允许嵌入作为语义令牌与自然语言指令一同注入。该设计消除了对中间文本表示的依赖,并支持对空间特征的内在推理。为评估该范式,我们构建了一个多任务地理空间基准,将嵌入与多种问答任务配对,包括特征查询、比较及语义描述。实验结果表明,DFR 使 LLM 能够解码潜在空间模式,并在各项任务中实现准确的零样本推理,同时相较基于文本的基线显著提升效率。我们的结果表明,将嵌入视作首要数据输入,为多模态地理空间智能提供了一种更直接、更高效且更具可扩展性的路径。
Representation learning for geospatial and spatio-temporal data plays a critical role in enabling general-purpose geospatial intelligence. Recent geospatial foundation models, such as the Population Dynamics Foundation Model (PDFM), encode complex population and mobility dynamics into compact embeddings. However, their integration with Large Language Models (LLMs) remains limited. Existing approaches to LLM integration treat these embeddings as retrieval indices or convert them into textual descriptions for reasoning, introducing redundancy, token inefficiency, and numerical inaccuracies. We propose Direct Feature Reasoning-Gemma (DFR-Gemma), a novel framework that enables LLMs to reason directly over dense geospatial embeddings. DFR aligns high-dimensional embeddings with the latent space of an LLM via a lightweight projector, allowing embeddings to be injected as semantic tokens alongside natural language instructions. This design eliminates the need for intermediate textual representations and enables intrinsic reasoning over spatial features. To evaluate this paradigm, we introduce a multi-task geospatial benchmark that pairs embeddings with diverse question-answer tasks, including feature querying, comparison, and semantic description. Experimental results show that DFR allows LLMs to decode latent spatial patterns and perform accurate zero-shot reasoning across tasks, while significantly improving efficiency compared to text-based baselines. Our results demonstrate that treating embeddings as primary data inputs, provides a more direct, efficient, and scalable approach to multimodal geospatial intelligence.