UrbanComp Lab | 学习资料库

返回论文库

论文

arXiv

GeoAI

GIS

RemoteSensing

EarthObservation

SpatialIntelligence

Multimodal

GeoMultimodal

中文标题

GeoX：通过自我对弈与可验证奖励掌握地理空间推理

English Title

GeoX: Mastering Geospatial Reasoning Through Self-Play and Verifiable Rewards

Kyeongjin Ahn, Seungeon Lee, Krishna P. Gummadi, Meeyoung Cha

发布时间

2026/5/19 23:37:01

来源类型

preprint

语言

摘要

中文对照

地理空间推理要求在场景复杂的空间结构上求解图像锚定的问题。然而，该能力的发展受限于标注庞大且组合爆炸式增长的问题空间所需高昂成本。我们提出 GeoX，一种自我对弈框架，通过可执行程序获取空间逻辑，并基于可验证奖励进行学习，无需依赖大规模人工构建的数据。给定一张卫星或航拍图像，本框架采用单一多模态策略，将空间问题表述为可执行程序，并在三种推理模式——溯因、演绎与归纳——下，利用空间基元及图像理解工具求解这些问题。验证器执行每个程序，生成奖励信号，联合优化两个角色（问题生成与问题求解）的强化学习目标。GeoX 在平均指标上使其基础视觉语言模型（VLM）提升最高达 5.5 分，性能匹配或超越在数百万条人工标注数据上训练的传统基线方法。除所提方法外，我们还发布了一个通过自我对弈积累构建的地理空间理解基准。

English Original

Geospatial reasoning requires solving image-grounded problems over the complex spatial structure of a scene. However, developing this capability is hindered by the cost of annotating a vast and combinatorial question space. We propose GeoX, a self-play framework that acquires spatial logic through executable programs that yield verifiable rewards, without relying on large-scale human-curated data Given a satellite or aerial image, our framework employs a single multimodal policy that proposes spatial problems as executable programs and solves them under three reasoning modes-abduction, deduction, and induction-over spatial primitives and an image understanding tool. A verifier executes each program to covert a reward signal that jointly optimizes the two roles via reinforcement learning. GeoX consistently improves its base VLMs by up to 5.5 points on average, matching or exceeding conventional baselines trained on millions of curated data. Along-side the proposed method, we release a benchmark for geospatial understanding accumulated through self-play.

资源链接

论文 PDFarxiv.org/pdf/2605.20006v1 论文 PDFarxiv.org/pdf/2605.20006v1 原始来源页面arxiv.org/abs/2605.20006v1

元数据

arXiv2605.20006v1

来源arXiv

类型论文

抽取状态raw

关键词