本文介绍针对 GeoAI 助理的智能体基础单元(agency primitives)的持续研究——这些是将基础模型(Foundation models)与以人工制品(artifacts)为中心、以人为中心(human-in-the-loop)的工作流相连接的核心能力,而地理信息系统(GIS)从业者实际工作正发生于此类工作流中。尽管卫星影像字幕生成、视觉问答及可提示分割等技术已取得进展,但这些能力尚未为从业者带来实际生产力提升;后者大部分时间用于生成矢量图层、栅格地图和制图成果。这一差距不仅源于模型能力本身,更在于缺乏一个支持迭代协作的智能体层(agency layer)。我们为此类智能体层提出一套包含 9 个基础单元的术语体系,涵盖导航(navigation)、感知(perception)、地理参考记忆(geo-referenced memory)与双重建模(dual modeling)等,并配套设计了一项衡量人类生产力的基准测试。本研究的目标是构建一套术语体系,使 GIS 领域中的智能体辅助功能具备可实现性、可评测性与可比性。
We present ongoing research on agency primitives for GeoAI assistants -- core capabilities that connect Foundation models to the artifact-centric, human-in-the-loop workflows where GIS practitioners actually work. Despite advances in satellite image captioning, visual question answering, and promptable segmentation, these capabilities have not translated into productivity gains for practitioners who spend most of their time producing vector layers, raster maps, and cartographic products. The gap is not model capability alone but the absence of an agency layer that supports iterative collaboration. We propose a vocabulary of $9$ primitives for such a layer -- including navigation, perception, geo-referenced memory, and dual modeling -- along with a benchmark that measures human productivity. Our goal is a vocabulary that makes agentic assistance in GIS implementable, testable, and comparable.