SeeFar是一个持续更新的多分辨率卫星图像集合,涵盖公开与商业卫星数据。该数据集专为训练不受卫星类型限制的地理空间基础模型而精心构建。近年来,技术进步使卫星影像比以往更加易得;过去五年发射的地球观测卫星数量超过了此前五十年的总和。现代商业卫星的空间分辨率已达到公共访问卫星的100倍。然而,商业卫星影像的高昂成本及历史数据获取受限,成为基础模型训练的障碍,影响了推理阶段可使用的图像范围。SeeFar数据集通过整合多分辨率的商业与公开访问预处理影像,推动实现卫星无关模型的训练,使用户能够在推理过程中结合历史数据与更高分辨率、成本更高的卫星影像,从而提升灵活性。为此,我们描述了一种标准化来自多种卫星源的数据流程,包括统一不同数据格式、对齐光谱波段,以增强数据互操作性。SeeFar数据集包含分辨率为384×384像素的影像,覆盖蓝、绿、红及近红外四个光谱波段,并支持从30米、10米、1.5米至1.0米的多级空间分辨率,所有数据均采用云优化的GeoTIFF格式。同时提供一致且全面的元数据,以提升数据透明度与可靠性。通过整合多源数据,SeeFar使经过处理且一致的卫星数据得以向更广泛的用户群体——包括研究人员与政策制定者——开放,促进竞争与创新。
SeeFar is an evolving collection of multi-resolution satellite images from public and commercial satellites. We specifically curated this dataset for training geospatial foundation models, unconstrained by satellite type. In recent years, advances in technology have made satellite imagery more accessible than ever. More earth-observing satellites have been launched in the last five years than in the previous fifty. Modern commercial satellites now offer up to 100 times the spatial resolution of public access satellites. However, the high cost and limited historical availability of commercial satellite imagery is a barrier to the training of foundational models, impacting what images can be used during inference. The SeeFar dataset represents a step towards training models that are satellite-agnostic by combining multi-resolution commercial and public access pre-processed images. This will enable users to utilize historical data alongside higher-resolution, more expensive satellite imagery, offering greater flexibility during inference. To achieve this, we describe a process for standardizing data from diverse satellite sources, normalizing different data formats, and aligning spectral bands to enhance interoperability. The SeeFar dataset includes images at a resolution of 384x384 pixels, spanning four spectral bands (Blue, Green, Red, and Near-Infrared) and expanding spatial resolutions (starting with 30, 10, 1.5, and 1.0 meters), all in cloud-optimized GeoTIFF format. It also provides consistent and comprehensive metadata to enhance data transparency and reliability. By aggregating data from multiple sources, SeeFar makes processed and consistent satellite data accessible to a wider range of users - from researchers to policymakers - fostering competition and innovation in satellite imagery analysis. The dataset is available at \url{coastalcarbon.ai/seefar}.