UrbanComp Lab | 学习资料库

返回论文库

论文

arXiv

GeoAI

GIS

RemoteSensing

EarthObservation

SpatialIntelligence

LLM

Multimodal

GeoMultimodal

中文标题

OmniGeo：面向地理空间人工智能的多模态大语言模型

English Title

OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence

Long Yuan, Fengran Mo, Kaiyu Huang, Wenjie Wang, Wangyuxuan Zhai, Xiaoyu Zhu, You Li, Jinan Xu, Jian-Yun Nie

发布时间

2025/3/21 00:45:48

来源类型

preprint

语言

摘要

中文对照

多模态大语言模型（LLM）的快速发展为人工智能开辟了新领域，实现了文本、图像及空间信息等多样化大规模数据类型的融合。本文探讨了多模态大语言模型（MLLM）在地理空间人工智能（GeoAI）中的潜力，该领域利用空间数据应对地理语义、健康地理学、城市地理学、城市感知以及遥感等领域的挑战。我们提出一种专用于地理空间应用的MLLM（OmniGeo），能够处理和分析异构数据源，包括卫星影像、地理空间元数据和文本描述。通过结合自然语言理解与空间推理的优势，本模型提升了指令遵循能力以及GeoAI系统的准确性。实验结果表明，该模型在多种地理空间任务上优于特定任务模型及现有LLM，在处理多模态特性的同时，于零样本地理空间任务中取得了具有竞争力的表现。代码将在论文发表后公开。

English Original

The rapid advancement of multimodal large language models (LLMs) has opened new frontiers in artificial intelligence, enabling the integration of diverse large-scale data types such as text, images, and spatial information. In this paper, we explore the potential of multimodal LLMs (MLLM) for geospatial artificial intelligence (GeoAI), a field that leverages spatial data to address challenges in domains including Geospatial Semantics, Health Geography, Urban Geography, Urban Perception, and Remote Sensing. We propose a MLLM (OmniGeo) tailored to geospatial applications, capable of processing and analyzing heterogeneous data sources, including satellite imagery, geospatial metadata, and textual descriptions. By combining the strengths of natural language understanding and spatial reasoning, our model enhances the ability of instruction following and the accuracy of GeoAI systems. Results demonstrate that our model outperforms task-specific models and existing LLMs on diverse geospatial tasks, effectively addressing the multimodality nature while achieving competitive results on the zero-shot geospatial tasks. Our code will be released after publication.

资源链接

论文 PDFarxiv.org/pdf/2503.16326v1 论文 PDFarxiv.org/pdf/2503.16326v1 原始来源页面arxiv.org/abs/2503.16326v1

元数据

arXiv2503.16326v1

来源arXiv

类型论文

抽取状态raw

关键词