将地面级地理空间数据及丰富的地理上下文信息(如开放街道地图,OSM)融入遥感(RS)基础模型(FM),对于推动地理空间智能并支持广泛任务至关重要。然而,RS与OSM数据之间的模态差异(包括数据结构、内容和空间粒度的不同)使得有效协同极具挑战性,且大多数现有RS基础模型仅关注图像数据。为此,本研究提出GeoLink,一种多模态框架,通过利用OSM数据在预训练及下游任务阶段增强RS基础模型。具体而言,GeoLink借助来自OSM数据的多粒度学习信号,结合跨模态空间相关性引导的信息交互与协作,提升RS自监督预训练效果;同时引入图像掩码重建机制,实现稀疏输入以提高预训练效率。在下游任务中,GeoLink生成单模态与多模态细粒度编码,支持从常规遥感解译任务(如土地覆盖分类)到更复杂的地理任务(如城市功能区划分)的广泛应用。大量实验表明,在预训练阶段融合OSM数据可显著提升RS图像编码器性能,而在下游任务中融合RS与OSM数据则增强了模型对复杂地理场景的适应能力。这些结果凸显了多模态协同在推进高级地理空间人工智能方面的潜力。此外,我们发现空间相关性在实现有效多模态地理空间数据集成中起关键作用。代码、检查点及使用示例已发布于https://github.com/bailubin/GeoLink_NeurIPS2025
Integrating ground-level geospatial data with rich geographic context, like OpenStreetMap (OSM), into remote sensing (RS) foundation models (FMs) is essential for advancing geospatial intelligence and supporting a broad spectrum of tasks. However, modality gap between RS and OSM data, including differences in data structure, content, and spatial granularity, makes effective synergy highly challenging, and most existing RS FMs focus on imagery alone. To this end, this study presents GeoLink, a multimodal framework that leverages OSM data to enhance RS FM during both the pretraining and downstream task stages. Specifically, GeoLink enhances RS self-supervised pretraining using multi-granularity learning signals derived from OSM data, guided by cross-modal spatial correlations for information interaction and collaboration. It also introduces image mask-reconstruction to enable sparse input for efficient pretraining. For downstream tasks, GeoLink generates both unimodal and multimodal fine-grained encodings to support a wide range of applications, from common RS interpretation tasks like land cover classification to more comprehensive geographic tasks like urban function zone mapping. Extensive experiments show that incorporating OSM data during pretraining enhances the performance of the RS image encoder, while fusing RS and OSM data in downstream tasks improves the FM's adaptability to complex geographic scenarios. These results underscore the potential of multimodal synergy in advancing high-level geospatial artificial intelligence. Moreover, we find that spatial correlation plays a crucial role in enabling effective multimodal geospatial data integration. Code, checkpoints, and using examples are released at https://github.com/bailubin/GeoLink_NeurIPS2025