论文
arXiv
GeoAI
GIS
Multimodal
GeoMultimodal
中文标题
UNIGEOCLIP:统一的地理空间对比学习
English Title
UNIGEOCLIP: Unified Geospatial Contrastive Learning
Guillaume Astruc, Eduard Trulls, Jan Hosang, Loic Landrieu, Paul-Edouard Sarlin
发布时间
2026/4/14 00:14:49
来源类型
preprint
语言
en
摘要
中文对照

共址地理空间数据(包括航拍影像、街景视图、高程模型、文本及地理坐标)日益丰富,为多模态表征学习提供了独特机遇。我们提出 UNIGEOCLIP,一种大规模多模态对比学习框架,可在单一统一嵌入空间中联合对齐五种互补的地理空间模态。与以往依赖模态融合或中心化枢轴表征的方法不同,本方法执行全对全(all-to-all)对比对齐,从而支持跨任意模态组合的无缝比较、检索与推理。我们进一步提出一种缩放式经纬度编码器(scaled latitude-longitude encoder),通过捕获多尺度地理结构以提升空间表征能力。在多项下游地理空间任务上的大量实验表明,UNIGEOCLIP 始终优于单模态对比模型及仅使用坐标的基线方法,凸显了整体式多模态地理空间对齐的优势。参考实现见 https://gastruc.github.io/unigeoclip。

English Original

The growing availability of co-located geospatial data spanning aerial imagery, street-level views, elevation models, text, and geographic coordinates offers a unique opportunity for multimodal representation learning. We introduce UNIGEOCLIP, a massively multimodal contrastive framework to jointly align five complementary geospatial modalities in a single unified embedding space. Unlike prior approaches that fuse modalities or rely on a central pivot representation, our method performs all-to-all contrastive alignment, enabling seamless comparison, retrieval, and reasoning across arbitrary combinations of modalities. We further propose a scaled latitude-longitude encoder that improves spatial representation by capturing multi-scale geographic structure. Extensive experiments across downstream geospatial tasks demonstrate that UNIGEOCLIP consistently outperforms single-modality contrastive models and coordinate-only baselines, highlighting the benefits of holistic multimodal geospatial alignment. A reference implementation is available at https://gastruc.github.io/unigeoclip.

元数据
arXiv2604.11668v1
来源arXiv
类型论文
抽取状态raw
关键词
GeoAI
GIS
Multimodal
GeoMultimodal
cs.CV