UrbanComp Lab | 学习资料库

返回论文库

论文

arXiv

GeoAI

GIS

RemoteSensing

EarthObservation

Trajectory

Mobility

Multimodal

GeoMultimodal

Agent

UrbanTraffic

中文标题

“咖啡馆入口看起来无障碍吗？门在哪儿？”——面向视觉查询的地理空间AI智能体

English Title

"Does the cafe entrance look accessible? Where is the door?" Towards Geospatial AI Agents for Visual Inquiries

Jon E. Froehlich, Jared Hwang, Zeyu Wang, John S. O'Meara, Xia Su, William Huang, Yang Zhang, Alex Fiannaca, Philip Nelson, Shaun Kane

发布时间

2025/8/22 01:49:52

来源类型

preprint

语言

摘要

中文对照

交互式数字地图已彻底改变了人们出行与认知世界的方式；然而，其依赖于地理信息系统（GIS）数据库中预先存在的结构化数据（例如道路网络、兴趣点索引），因而难以回答与现实世界视觉外观相关的地理-视觉问题。本文提出“地理-视觉智能体”（Geo-Visual Agents）的构想：一类多模态AI智能体，能够通过分析大规模地理空间图像库（包括街景图像（如Google街景）、场所关联照片（如TripAdvisor、Yelp）及航拍影像（如卫星图像））并融合传统GIS数据源，理解并回应关于现实世界细致入微的视觉-空间查询。我们阐述该构想的定义，描述感知与交互方法，给出三个示例，并列举未来研究中的关键挑战与机遇。

English Original

Interactive digital maps have revolutionized how people travel and learn about the world; however, they rely on pre-existing structured data in GIS databases (e.g., road networks, POI indices), limiting their ability to address geo-visual questions related to what the world looks like. We introduce our vision for Geo-Visual Agents--multimodal AI agents capable of understanding and responding to nuanced visual-spatial inquiries about the world by analyzing large-scale repositories of geospatial images, including streetscapes (e.g., Google Street View), place-based photos (e.g., TripAdvisor, Yelp), and aerial imagery (e.g., satellite photos) combined with traditional GIS data sources. We define our vision, describe sensing and interaction approaches, provide three exemplars, and enumerate key challenges and opportunities for future work.

资源链接

论文 PDFarxiv.org/pdf/2508.15752v1 论文 PDFarxiv.org/pdf/2508.15752v1 原始来源页面arxiv.org/abs/2508.15752v1

元数据

arXiv2508.15752v1

来源arXiv

类型论文

抽取状态raw

关键词