UrbanComp Lab | 学习资料库

返回论文库

论文

arXiv

GeoAI

GIS

Multimodal

GeoMultimodal

中文标题

UNIGEOCLIP：统一的地理空间对比学习

English Title

UNIGEOCLIP: Unified Geospatial Contrastive Learning

Guillaume Astruc, Eduard Trulls, Jan Hosang, Loic Landrieu, Paul-Edouard Sarlin

发布时间

2026/4/14 00:14:49

来源类型

preprint

语言

摘要

中文对照

共址地理空间数据（包括航拍影像、街景视图、高程模型、文本及地理坐标）日益丰富，为多模态表征学习提供了独特机遇。我们提出 UNIGEOCLIP，一种大规模多模态对比学习框架，可在单一统一嵌入空间中联合对齐五种互补的地理空间模态。与以往依赖模态融合或中心化枢轴表征的方法不同，本方法执行全对全（all-to-all）对比对齐，从而支持跨任意模态组合的无缝比较、检索与推理。我们进一步提出一种缩放式经纬度编码器（scaled latitude-longitude encoder），通过捕获多尺度地理结构以提升空间表征能力。在多项下游地理空间任务上的大量实验表明，UNIGEOCLIP 始终优于单模态对比模型及仅使用坐标的基线方法，凸显了整体式多模态地理空间对齐的优势。参考实现见 https://gastruc.github.io/unigeoclip。

English Original

The growing availability of co-located geospatial data spanning aerial imagery, street-level views, elevation models, text, and geographic coordinates offers a unique opportunity for multimodal representation learning. We introduce UNIGEOCLIP, a massively multimodal contrastive framework to jointly align five complementary geospatial modalities in a single unified embedding space. Unlike prior approaches that fuse modalities or rely on a central pivot representation, our method performs all-to-all contrastive alignment, enabling seamless comparison, retrieval, and reasoning across arbitrary combinations of modalities. We further propose a scaled latitude-longitude encoder that improves spatial representation by capturing multi-scale geographic structure. Extensive experiments across downstream geospatial tasks demonstrate that UNIGEOCLIP consistently outperforms single-modality contrastive models and coordinate-only baselines, highlighting the benefits of holistic multimodal geospatial alignment. A reference implementation is available at https://gastruc.github.io/unigeoclip.

资源链接

论文 PDFarxiv.org/pdf/2604.11668v1 论文 PDFarxiv.org/pdf/2604.11668v1 原始来源页面arxiv.org/abs/2604.11668v1

元数据

arXiv2604.11668v1

来源arXiv

类型论文

抽取状态raw

关键词

GeoAI

GIS

Multimodal

GeoMultimodal

cs.CV