自监督学习(SSL)已革新遥感(RS)领域的表征学习,推动地理空间基础模型(GFMs)利用海量未标注卫星影像以支持多样化的下游任务。目前,GFMs主要采用对比学习或掩码图像建模等目标,因其在学习可迁移表征方面表现优异。然而,生成式扩散模型在图像生成过程中展现出捕捉遥感任务所需多粒度语义的潜力,却尚未在判别性应用中得到充分探索。这引发了一个问题:生成式扩散模型是否同样具备卓越性能,并可作为具备足够判别能力的GFMs?本文通过SatDiFuser框架给出回答,该框架将基于扩散的生成式地理空间基础模型转化为强大的判别性遥感预训练工具。通过对多阶段、依赖噪声的扩散特征进行系统分析,我们提出了三种融合策略,以有效利用这些多样化表征。在遥感基准测试上的大量实验表明,SatDiFuser优于当前最先进的GFMs,在语义分割任务中实现最高达+5.7%的mIoU提升,在分类任务中实现+7.9%的F1分数提升,证明了基于扩散的生成式基础模型在判别性能上可与甚至超越传统判别式GFMs。源代码地址:https://github.com/yurujaja/SatDiFuser。
Self-supervised learning (SSL) has revolutionized representation learning in Remote Sensing (RS), advancing Geospatial Foundation Models (GFMs) to leverage vast unlabeled satellite imagery for diverse downstream tasks. Currently, GFMs primarily employ objectives like contrastive learning or masked image modeling, owing to their proven success in learning transferable representations. However, generative diffusion models, which demonstrate the potential to capture multi-grained semantics essential for RS tasks during image generation, remain underexplored for discriminative applications. This prompts the question: can generative diffusion models also excel and serve as GFMs with sufficient discriminative power? In this work, we answer this question with SatDiFuser, a framework that transforms a diffusion-based generative geospatial foundation model into a powerful pretraining tool for discriminative RS. By systematically analyzing multi-stage, noise-dependent diffusion features, we develop three fusion strategies to effectively leverage these diverse representations. Extensive experiments on remote sensing benchmarks show that SatDiFuser outperforms state-of-the-art GFMs, achieving gains of up to +5.7% mIoU in semantic segmentation and +7.9% F1-score in classification, demonstrating the capacity of diffusion-based generative foundation models to rival or exceed discriminative GFMs. The source code is available at: https://github.com/yurujaja/SatDiFuser.