论文
arXiv
GeoAI
GIS
RemoteSensing
EarthObservation
SpatialIntelligence
Multimodal
GeoMultimodal
中文标题
GeoX:通过自我对弈与可验证奖励掌握地理空间推理
English Title
GeoX: Mastering Geospatial Reasoning Through Self-Play and Verifiable Rewards
Kyeongjin Ahn, Seungeon Lee, Krishna P. Gummadi, Meeyoung Cha
发布时间
2026/5/19 23:37:01
来源类型
preprint
语言
en
摘要
中文对照

地理空间推理要求在场景复杂的空间结构上求解图像锚定的问题。然而,该能力的发展受限于标注庞大且组合爆炸式增长的问题空间所需高昂成本。我们提出 GeoX,一种自我对弈框架,通过可执行程序获取空间逻辑,并基于可验证奖励进行学习,无需依赖大规模人工构建的数据。给定一张卫星或航拍图像,本框架采用单一多模态策略,将空间问题表述为可执行程序,并在三种推理模式——溯因、演绎与归纳——下,利用空间基元及图像理解工具求解这些问题。验证器执行每个程序,生成奖励信号,联合优化两个角色(问题生成与问题求解)的强化学习目标。GeoX 在平均指标上使其基础视觉语言模型(VLM)提升最高达 5.5 分,性能匹配或超越在数百万条人工标注数据上训练的传统基线方法。除所提方法外,我们还发布了一个通过自我对弈积累构建的地理空间理解基准。

English Original

Geospatial reasoning requires solving image-grounded problems over the complex spatial structure of a scene. However, developing this capability is hindered by the cost of annotating a vast and combinatorial question space. We propose GeoX, a self-play framework that acquires spatial logic through executable programs that yield verifiable rewards, without relying on large-scale human-curated data Given a satellite or aerial image, our framework employs a single multimodal policy that proposes spatial problems as executable programs and solves them under three reasoning modes-abduction, deduction, and induction-over spatial primitives and an image understanding tool. A verifier executes each program to covert a reward signal that jointly optimizes the two roles via reinforcement learning. GeoX consistently improves its base VLMs by up to 5.5 points on average, matching or exceeding conventional baselines trained on millions of curated data. Along-side the proposed method, we release a benchmark for geospatial understanding accumulated through self-play.

元数据
arXiv2605.20006v1
来源arXiv
类型论文
抽取状态raw
关键词
GeoAI
GIS
RemoteSensing
EarthObservation
SpatialIntelligence
Multimodal
GeoMultimodal
cs.AI