论文
arXiv
GeoAI
GIS
RemoteSensing
EarthObservation
中文标题
Cryo-Bench:面向冰冻圈应用的基础模型基准测试
English Title
Cryo-Bench: Benchmarking Foundation Models for Cryosphere Applications
Saurabh Kaushik, Lalit Maurya, Beth Tellman
发布时间
2026/3/2 16:05:56
来源类型
preprint
语言
en
摘要
中文对照

地理基础模型(GFMs)已在多种地球观测任务中得到评估,涵盖多个领域,并展现出即使在标签稀疏条件下也能生成可靠制图结果的强大学习潜力。然而,针对冰冻圈(Cryosphere)应用的GFMs基准测试仍十分有限,主要原因在于缺乏适配的评估数据集。为填补这一空白,我们提出\textbf{Cryo-Bench}——一个专为评估GFMs在关键冰冻圈组分上性能而构建的基准测试套件。Cryo-Bench涵盖碎屑覆盖冰川、冰川湖、海冰和冰川崩解前沿,数据来源包括多种传感器,覆盖广阔地理区域。我们评估了14种GFMs以及UNet和ViT两类基线模型,以系统分析其优势、局限性及最优使用策略。在编码器冻结设定下,UNet在Cryo-Bench所含五个评估数据集上的平均mIoU最高,达\textbf{66.38},其次为TerraMind(\textbf{64.02})。在少样本设定(仅使用10\%输入数据)下,DOFA与TerraMind等GFMs表现优于UNet,mIoU分别达\textbf{59.53}、\textbf{56.62}和\textbf{56.60},而UNet为56.60。当对GFMs进行全量微调时,各模型在不同数据集上的性能表现不一致;但若同步优化学习率,则可显著提升GFMs性能——例如在两个代表性数据集(GLID与CaFFe)上的评估显示,平均相对性能提升达\textbf{12.77\%}。尽管预训练数据中冰冻圈样本极少,GFMs仍展现出显著的跨域适应能力,并在各项任务中产出有意义的结果。基于上述发现,我们建议采用编码器微调并辅以超参数优化以实现最优性能;而在资源受限时,可采用冻结编码器策略。

English Original

Geo-Foundation Models (GFMs) have been evaluated across diverse Earth observation task including multiple domains and have demonstrated strong potential of producing reliable maps even with sparse labels. However, benchmarking GFMs for Cryosphere applications has remained limited, primarily due to the lack of suitable evaluation datasets. To address this gap, we introduce \textbf{Cryo-Bench}, a benchmark compiled to evaluate GFM performance across key Cryospheric components. Cryo-Bench includes debris-covered glaciers, glacial lakes, sea ice, and calving fronts, spanning multiple sensors and broad geographic regions. We evaluate 14 GFMs alongside UNet and ViT baselines to assess their advantages, limitations, and optimal usage strategies. With a frozen encoder, UNet achieves the highest average mIoU of \textbf{66.38}, followed by TerraMind at \textbf{64.02} across five evluation dataset included in Cryo-Bench. In the few-shot setting (10\% input data), GFMs such as DOFA and TerraMind outperform UNet, achieving mIoU scores of \textbf{59.53}, \textbf{56.62}, and \textbf{56.60}, respectively, comapred to U-Net's 56.60. When fully finetuning GFMs, we observe inconsistent performance across datasets and models. However, tuning learning rate along with finetuning substantially improves GFM performance. For example, evaluation on two representative datasets (GLID and CaFFe) shows an average relative improvement of \textbf{12.77\%}. Despite having minimal Cryosphere representation in their pretraining data, GFMs exhibit notable domain adaptation capabilities and produce meaningful results across tasks. Based on our findings, We recommend encoder fine-tuning with hyperparameter optimization optimization to achieve the best possible performance, while using frozen encoders when users need quick results without extensive experimentation.(\href{https://github.com/Sk-2103/Cryo-Bench}{GitHub}).

元数据
arXiv2603.01576v2
来源arXiv
类型论文
抽取状态raw
关键词
GeoAI
GIS
RemoteSensing
EarthObservation
cs.CV