视觉基础模型是地理空间人工智能(GeoAI)领域的前沿方向,该领域为地理空间问题求解与地理知识发现应用并拓展人工智能技术。由于其能够通过学习和提取海量地理空间数据中的重要图像特征,从而实现强大的图像分析能力,因此具有重要意义。本文评估了首个地理空间基础模型——IBM-NASA联合研发的Prithvi模型在关键地理空间分析任务——洪水淹没制图中的表现。将该模型与基于卷积神经网络及视觉Transformer的架构进行对比,评估其在洪水区域制图精度方面的性能。实验采用Sen1Floods11基准数据集,并基于测试数据集以及模型从未见过的全新数据集,评估各模型的预测能力、泛化能力与迁移能力。结果表明,Prithvi模型具备良好的迁移能力,尤其在未见过区域的洪水区域分割任务中表现出显著优势。研究同时指出,Prithvi模型在多尺度表征学习的采用、面向高层图像分析任务的端到端流程开发,以及输入数据波段灵活性方面仍有改进空间。
Vision foundation models are a new frontier in Geospatial Artificial Intelligence (GeoAI), an interdisciplinary research area that applies and extends AI for geospatial problem solving and geographic knowledge discovery, because of their potential to enable powerful image analysis by learning and extracting important image features from vast amounts of geospatial data. This paper evaluates the performance of the first-of-its-kind geospatial foundation model, IBM-NASA's Prithvi, to support a crucial geospatial analysis task: flood inundation mapping. This model is compared with convolutional neural network and vision transformer-based architectures in terms of mapping accuracy for flooded areas. A benchmark dataset, Sen1Floods11, is used in the experiments, and the models' predictability, generalizability, and transferability are evaluated based on both a test dataset and a dataset that is completely unseen by the model. Results show the good transferability of the Prithvi model, highlighting its performance advantages in segmenting flooded areas in previously unseen regions. The findings also indicate areas for improvement for the Prithvi model in terms of adopting multi-scale representation learning, developing more end-to-end pipelines for high-level image analysis tasks, and offering more flexibility in terms of input data bands.