论文
arXiv
RemoteSensing
EarthObservation
中文标题
面向下游遥感感知的边缘-云协同重建:基于结构感知潜在扩散的方法
English Title
Edge-Cloud Collaborative Reconstruction via Structure-Aware Latent Diffusion for Downstream Remote Sensing Perception
Yun Li, Xianju Li
发布时间
2026/4/28 15:28:39
来源类型
preprint
语言
en
摘要
中文对照

高分辨率遥感数据的指数级增长正面临星地传输的严重瓶颈。受限的下行链路带宽迫使采用极高压缩比,从而不可逆地损毁对目标检测等下游机器感知任务至关重要的高频结构细节。尽管当前超分辨率技术试图恢复这些细节,但基于回归的方法常导致纹理过度平滑,而生成式扩散模型又频繁引入结构幻觉,进而误导检测系统。为应对这一权衡问题,我们提出结构感知潜在扩散(Structure-Aware Latent Diffusion, SALD)框架——一种非对称的边缘-云协同超分辨率系统。在资源受限的边缘端,该系统将影像解耦为高度压缩的低频载荷与轻量级软结构先验;传输该解耦表示可最小化带宽消耗。在算力强大的云端,我们在扩散主干网络中引入结构门控大核(Structure-Gated Large Kernel, SGLK)模块与语义引导引擎(Semantic-Guidance Engine, SGE),利用所传输的结构先验对大核卷积进行门控,从而有效捕获航拍场景固有的长程依赖关系,并主动抑制生成式幻觉。在MSCM与UCMerced数据集上的大量实验表明:即使在极端带宽约束下,SALD仍能实现更优的感知质量(LPIPS),并显著提升下游任务性能,包括场景分类与小目标检测。

English Original

The exponential surge in high-resolution remote sensing data faces a severe bottleneck in satellite-to-ground transmission. Limited downlink bandwidth forces the use of extreme high-ratio compression, which irreversibly destroys high-frequency structural details essential for downstream machine perception tasks like object detection. While current super-resolution techniques attempt to recover these details, regression-based methods often yield over-smoothed textures, and generative diffusion models frequently introduce structural hallucinations that mislead detection systems. To address this trade-off, we propose the Structure-Aware Latent Diffusion (SALD) framework, an asymmetric edge-cloud collaborative SR system. At the resource-constrained edge, the system decouples imagery into a highly compressed low-frequency payload and a lightweight soft structural prior. Transmitting this decoupled representation minimizes bandwidth consumption. On the powerful cloud side, we introduce a Structure-Gated Large Kernel (SGLK) module and a Semantic-Guidance Engine (SGE) within the diffusion backbone. These modules leverage the transmitted structural priors to gate large-kernel convolutions, effectively capturing long-range dependencies inherent in aerial scenes while actively suppressing generative hallucinations. Extensive experiments on both the MSCM and UCMerced datasets demonstrate that, even under extreme bandwidth constraints, SALD achieves superior perceptual quality (LPIPS) and significantly enhances downstream performance in both scene classification and small-target detection.

元数据
arXiv2604.25319v1
来源arXiv
类型论文
抽取状态raw
关键词
RemoteSensing
EarthObservation
cs.CV