预训练与微调已成为遥感图像解译的新范式。其中,基于掩码自编码器(MAE)的预训练因其能够通过重建被掩码图像区域来学习通用特征表示而尤为突出。然而,将MAE应用于多光谱遥感图像仍面临挑战,主要源于复杂的背景、目标不明显以及掩码过程中缺乏语义引导,这阻碍了底层结构和有意义的空间-光谱特征的学习。为此,我们提出一种简单而有效的多光谱图像预训练方法——光谱指数引导的MAE(SIGMAE)。其核心思想是引入领域特定的光谱指数作为先验知识,以指导动态令牌掩码聚焦于信息丰富的区域。SIGMAE引入了语义显著性引导的动态令牌掩码(SSDTM)策略,采用课程学习风格,量化每个图像块的语义丰富度与内部异质性,从而在训练过程中自适应地选择最具信息量的令牌。通过优先关注语义显著区域并逐步增加样本难度,SSDTM增强了光谱丰富且结构感知的表征学习能力,缓解过拟合问题,并相比随机掩码减少了冗余计算。在五个广泛使用的数据集上开展的大量实验表明,SIGMAE在场景分类、语义分割、目标提取和变化检测等多种下游任务中均优于其他预训练地理空间基础模型。此外,即使在90%的掩码率下,SIGMAE仍表现出强大的空间-光谱重建能力,并在标注数据有限的情况下提升了复杂目标的识别性能。源代码与模型权重将
Pretraining and fine-tuning have emerged as a new paradigm in remote sensing image interpretation. Among them, Masked Autoencoder (MAE)-based pretraining stands out for its strong capability to learn general feature representations via reconstructing masked image regions. However, applying MAE to multispectral remote sensing images remains challenging due to complex backgrounds, indistinct targets, and the lack of semantic guidance during masking, which hinders the learning of underlying structures and meaningful spatial-spectral features. To address this, we propose a simple yet effective approach, Spectral Index-Guided MAE (SIGMAE), for multispectral image pretraining. The core idea is to incorporate domain-specific spectral indices as prior knowledge to guide dynamic token masking toward informative regions. SIGMAE introduces Semantic Saliency-Guided Dynamic Token Masking (SSDTM), a curriculum-style strategy that quantifies each patch's semantic richness and internal heterogeneity to adaptively select the most informative tokens during training. By prioritizing semantically salient regions and progressively increasing sample difficulty, SSDTM enhances spectrally rich and structurally aware representation learning, mitigates overfitting, and reduces redundant computation compared with random masking. Extensive experiments on five widely used datasets covering various downstream tasks, including scene classification, semantic segmentation, object extraction and change detection, demonstrate that SIGMAE outperforms other pretrained geospatial foundation models. Moreover, it exhibits strong spatial-spectral reconstruction capability, even with a 90% mask ratio, and improves complex target recognition under limited labeled data. The source codes and model weights will be released at https://github.com/zxk688/SIGMAE.