论文
arXiv
GeoAI
GIS
SpatialIntelligence
GeoLargeModel
GeoFoundationModel
中文标题
AlphaEarth究竟是什么?全球土地覆盖的层级结构与功能可解释性
English Title
What on Earth is AlphaEarth? Hierarchical structure and functional interpretability for global land cover
Ivan Felipe Benavides-Martinez, Justin Guthrie, Jhon Edwin Arias, Yeison Alberto Garces-Gomez, Angela Ines Guzman-Alvis, Cristiam Victoriano Portilla-Cabrera, Somnath Mondal, Andrew J. Allyn, Auroop R. Ganguly
发布时间
2026/3/8 10:40:03
来源类型
preprint
语言
en
摘要
中文对照

地理空间基础模型生成高维嵌入,具有强大的预测性能,但其内部组织机制仍不明确,限制了其科学应用。近期的可解释性研究将谷歌AlphaEarth基础模型(GAEF)嵌入与连续环境变量相关联,但仍不清楚嵌入空间是否表现出功能或层级组织,即某些维度是否作为特定表示,而其他维度则编码共享或更广泛的地理空间结构。在本研究中,我们提出一种功能可解释性框架,通过分析嵌入维度对土地覆盖结构的贡献,反向解析其作用,依据观测到的分类行为进行建模。该方法结合大规模实验与基于特征重要性模式和逐步消融的嵌入-类别关系结构分析。结果表明,嵌入维度表现出一致且非均匀的功能行为,可沿层级功能谱进行分类:专精维度对应特定土地覆盖类别,低级与中级泛化维度捕捉类别间的共享特征,高级泛化维度反映更广泛的环境梯度。关键发现是,仅需使用64个可用维度中的2至12个(视类别而定),即可实现接近基线性能的准确土地覆盖分类(达到98%)。这表明嵌入空间存在显著冗余,并为大幅降低计算成本提供了可行路径。综上,这些发现揭示AlphaEarth嵌入不仅具有物理信息意义,还呈现出层级化的功能组织结构。

English Original

Geospatial foundation models generate high-dimensional embeddings that achieve strong predictive performance, yet their internal organization remains obscure, limiting their scientific use. Recent interpretability studies relate Google AlphaEarth Foundations (GAEF) embeddings to continuous environmental variables, but it is still unclear whether the embedding space exhibits a functional or hierarchical organization, in which some dimensions act as specialized representations while others encode shared or broader geospatial structure. In this work, we propose a functional interpretability framework that reverse-engineers the role of embedding dimensions by characterizing their contribution to land cover structure from observed classification behavior. The approach combines large-scale experimentation with a structural analysis of embedding-class relationships based on feature importance patterns and progressive ablation. Our results show that embedding dimensions exhibit consistent and non-uniform functional behavior, allowing them to be categorized along a hierarchical functional spectrum: specialist dimensions associated with specific land cover classes, low- and mid-generalist dimensions capturing shared characteristics between classes, and highgeneralist dimensions reflecting broader environmental gradients. Critically, we find that accurate land cover classification (98% of baseline performance) can be achieved using as few as 2 to 12 of the 64 available dimensions, depending on the class. This demonstrates substantial redundancy in the embedding space and offers a pathway toward significant reductions in computational cost. Together, these findings reveal that AlphaEarth embeddings are not only physically informative, but also functionally organized into a hierarchical structure, providing practical guidance for dimension selection in operational classification tasks.

元数据
arXiv2603.16911v1
来源arXiv
类型论文
抽取状态raw
关键词
GeoAI
GIS
SpatialIntelligence
GeoLargeModel
GeoFoundationModel
cs.LG
cs.AI