当前地球观测领域的大规模多模态模型(LMMs)通常忽略关键的“垂直”维度,从而限制了其在复杂遥感几何结构及灾害场景中的推理能力——在这些场景中,物理空间结构往往比平面视觉纹理更为重要。为弥补这一空白,我们提出一个专用于高度感知遥感理解的综合性评估框架。首先,为应对标注数据严重匮乏的问题,我们构建了一条可扩展的、基于视觉语言模型(VLM)的数据生成流水线,该流水线结合系统性提示工程与元数据提取技术。该流水线构建了两个互补的基准数据集:用于相对高度分析的 GeoHeight-Bench,以及更具挑战性的 GeoHeight-Bench+(支持整体性、地形感知推理)。此外,为验证高度感知的必要性,我们提出了 GeoHeightChat——首个具备高度感知能力的遥感 LMM 基线模型。作为一项有力的概念验证,该基线模型表明:将视觉语义与隐式注入的高度几何特征协同融合,可有效缓解模型的“垂直盲区”,成功在现有光学模型中开启交互式高度推理的新范式。
Current Large Multimodal Models (LMMs) in Earth Observation typically neglect the critical "vertical" dimension, limiting their reasoning capabilities in complex remote sensing geometries and disaster scenarios where physical spatial structures often outweigh planar visual textures. To bridge this gap, we introduce a comprehensive evaluation framework dedicated to height-aware remote sensing understanding. First, to overcome the severe scarcity of annotated data, we develop a scalable, VLM-driven data generation pipeline utilizing systematic prompt engineering and metadata extraction. This pipeline constructs two complementary benchmarks: GeoHeight-Bench for relative height analysis, and a more challenging GeoHeight-Bench+ for holistic, terrain-aware reasoning. Furthermore, to validate the necessity of height perception, we propose GeoHeightChat, the first height-aware remote sensing LMM baseline. Serving as a strong proof of concept, our baseline demonstrates that synergizing visual semantics with implicitly injected height geometric features effectively mitigates the "vertical blind spot", successfully unlocking a new paradigm of interactive height reasoning in existing optical models.