论文
arXiv
GeoAI
GIS
RemoteSensing
EarthObservation
SpatialIntelligence
LLM
Multimodal
GeoMultimodal
中文标题
OmniGeo:面向地理空间人工智能的多模态大语言模型
English Title
OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence
Long Yuan, Fengran Mo, Kaiyu Huang, Wenjie Wang, Wangyuxuan Zhai, Xiaoyu Zhu, You Li, Jinan Xu, Jian-Yun Nie
发布时间
2025/3/21 00:45:48
来源类型
preprint
语言
en
摘要
中文对照

多模态大语言模型(LLM)的快速发展为人工智能开辟了新领域,实现了文本、图像及空间信息等多样化大规模数据类型的融合。本文探讨了多模态大语言模型(MLLM)在地理空间人工智能(GeoAI)中的潜力,该领域利用空间数据应对地理语义、健康地理学、城市地理学、城市感知以及遥感等领域的挑战。我们提出一种专用于地理空间应用的MLLM(OmniGeo),能够处理和分析异构数据源,包括卫星影像、地理空间元数据和文本描述。通过结合自然语言理解与空间推理的优势,本模型提升了指令遵循能力以及GeoAI系统的准确性。实验结果表明,该模型在多种地理空间任务上优于特定任务模型及现有LLM,在处理多模态特性的同时,于零样本地理空间任务中取得了具有竞争力的表现。代码将在论文发表后公开。

English Original

The rapid advancement of multimodal large language models (LLMs) has opened new frontiers in artificial intelligence, enabling the integration of diverse large-scale data types such as text, images, and spatial information. In this paper, we explore the potential of multimodal LLMs (MLLM) for geospatial artificial intelligence (GeoAI), a field that leverages spatial data to address challenges in domains including Geospatial Semantics, Health Geography, Urban Geography, Urban Perception, and Remote Sensing. We propose a MLLM (OmniGeo) tailored to geospatial applications, capable of processing and analyzing heterogeneous data sources, including satellite imagery, geospatial metadata, and textual descriptions. By combining the strengths of natural language understanding and spatial reasoning, our model enhances the ability of instruction following and the accuracy of GeoAI systems. Results demonstrate that our model outperforms task-specific models and existing LLMs on diverse geospatial tasks, effectively addressing the multimodality nature while achieving competitive results on the zero-shot geospatial tasks. Our code will be released after publication.

元数据
arXiv2503.16326v1
来源arXiv
类型论文
抽取状态raw
关键词
GeoAI
GIS
RemoteSensing
EarthObservation
SpatialIntelligence
LLM
Multimodal
GeoMultimodal
cs.AI