论文
arXiv
RemoteSensing
EarthObservation
Multimodal
GeoMultimodal
中文标题
SGMA:面向遥感不完整多模态数据的语义引导模态感知分割
English Title
SGMA: Semantic-Guided Modality-Aware Segmentation for Remote Sensing with Incomplete Multimodal Data
Lekang Wen, Liang Liao, Jing Xiao, Mi Wang
发布时间
2026/3/3 09:28:21
来源类型
preprint
语言
en
摘要
中文对照

多模态语义分割通过整合来自不同传感器的互补信息,实现遥感地球观测。然而,实际系统常因传感器故障或覆盖不全导致模态缺失,即不完整多模态语义分割(IMSS)。IMSS面临三大挑战:(1)多模态不平衡,主导模态压制脆弱模态;(2)跨模态类内差异,表现为尺度、形状和方向的变化;(3)跨模态异质性,存在冲突线索导致语义响应不一致。现有方法依赖对比学习或联合优化,易造成过度对齐,忽略模态特异性特征或训练不平衡,偏向鲁棒模态,且普遍忽视类内差异与跨模态异质性。为此,本文提出语义引导模态感知(SGMA)框架,在确保多模态均衡学习的同时,通过语义引导减少类内差异并调和跨模态不一致性。SGMA引入两个互补的即插即用模块:(1)语义引导融合(SGF)模块提取多尺度、类别相关的语义原型,捕捉跨模态的一致性类别表征,基于原型-特征对齐估计各模态鲁棒性,并依据鲁棒性得分进行自适应加权融合,以缓解类内差异与跨模态异质性;(2)模态感知采样(MAS)模块利用SGF提供的鲁棒性估计,动态重加权训练样本,优先关注脆弱模态中的困难样本,以解决模态不平衡问题。在多个数据集与骨干网络上的大量实验表明,SGMA

English Original

Multimodal semantic segmentation integrates complementary information from diverse sensors for remote sensing Earth observation. However, practical systems often encounter missing modalities due to sensor failures or incomplete coverage, termed Incomplete Multimodal Semantic Segmentation (IMSS). IMSS faces three key challenges: (1) multimodal imbalance, where dominant modalities suppress fragile ones; (2) intra-class variation in scale, shape, and orientation across modalities; and (3) cross-modal heterogeneity with conflicting cues producing inconsistent semantic responses. Existing methods rely on contrastive learning or joint optimization, which risk over-alignment, discarding modality-specific cues or imbalanced training, favoring robust modalities, while largely overlooking intra-class variation and cross-modal heterogeneity. To address these limitations, we propose the Semantic-Guided Modality-Aware (SGMA) framework, which ensures balanced multimodal learning while reducing intra-class variation and reconciling cross-modal inconsistencies through semantic guidance. SGMA introduces two complementary plug-and-play modules: (1) Semantic-Guided Fusion (SGF) module extracts multi-scale, class-wise semantic prototypes that capture consistent categorical representations across modalities, estimates per-modality robustness based on prototype-feature alignment, and performs adaptive fusion weighted by robustness scores to mitigate intra-class variation and cross-modal heterogeneity; (2) Modality-Aware Sampling (MAS) module leverages robustness estimations from SGF to dynamically reweight training samples, prioritizing challenging samples from fragile modalities to address modality imbalance. Extensive experiments across multiple datasets and backbones demonstrate that SGMA consistently outperforms state-of-the-art methods, with particularly significant improvements in fragile modalities.

元数据
arXiv2603.02505v1
来源arXiv
类型论文
抽取状态raw
关键词
RemoteSensing
EarthObservation
Multimodal
GeoMultimodal
cs.CV