论文
arXiv
GeoAI
GIS
RemoteSensing
EarthObservation
LLM
Multimodal
GeoMultimodal
Agent
中文标题
GeoMMBench 与 GeoMMAgent:迈向地球科学与遥感领域的专家级多模态智能
English Title
GeoMMBench and GeoMMAgent: Toward Expert-Level Multimodal Intelligence in Geoscience and Remote Sensing
Aoran Xiao, Shihao Cheng, Yonghao Xu, Yexian Ren, Hongruixuan Chen, Naoto Yokoya
发布时间
2026/4/10 10:59:38
来源类型
preprint
语言
en
摘要
中文对照

多模态大语言模型(MLLM)的近期进展加速了领域导向人工智能的发展,但其在地球科学与遥感(RS)领域的演进仍受限于若干独特挑战:跨学科知识广度大、传感器模态异构性强、任务类型碎片化。为弥合上述差距,我们提出 GeoMMBench——一个覆盖多元 RS 学科、传感器类型与任务范畴的综合性多模态问答基准,支持比既有基准更广泛、更严格的评估。基于 GeoMMBench,我们对 36 个开源及商用大语言模型开展评测,系统揭示其在领域知识、感知具身性与推理能力等方面的不足,而这些能力恰是实现专家级地理空间解译所必需的。除评估外,我们进一步提出 GeoMMAgent,一种多智能体框架,通过领域专用 RS 模型与工具,策略性地整合检索、感知与推理模块。大量实验结果表明,GeoMMAgent 显著优于独立运行的大语言模型,印证了工具增强型智能体在动态应对复杂地球科学与遥感挑战中的关键作用。

English Original

Recent advances in multimodal large language models (MLLMs) have accelerated progress in domain-oriented AI, yet their development in geoscience and remote sensing (RS) remains constrained by distinctive challenges: wide-ranging disciplinary knowledge, heterogeneous sensor modalities, and a fragmented spectrum of tasks. To bridge these gaps, we introduce GeoMMBench, a comprehensive multimodal question-answering benchmark covering diverse RS disciplines, sensors, and tasks, enabling broader and more rigorous evaluation than prior benchmarks. Using GeoMMBench, we assess 36 open-source and proprietary large language models, uncovering systematic deficiencies in domain knowledge, perceptual grounding, and reasoning--capabilities essential for expert-level geospatial interpretation. Beyond evaluation, we propose GeoMMAgent, a multi-agent framework that strategically integrates retrieval, perception, and reasoning through domain-specific RS models and tools. Extensive experimental results demonstrate that GeoMMAgent significantly outperforms standalone LLMs, underscoring the importance of tool-augmented agents for dynamically tackling complex geoscience and RS challenges.

元数据
arXiv2604.08896v1
来源arXiv
类型论文
抽取状态raw
关键词
GeoAI
GIS
RemoteSensing
EarthObservation
LLM
Multimodal
GeoMultimodal
Agent
cs.CV