资讯
Google Research Blog
AI
Industry
Dataset
中文标题
教会人工智能读取地图
English Title
Teaching AI to read a map
Google Research Blog
发布时间
2026/2/18 05:37:00
来源类型
blog
语言
en
摘要
中文对照

观察一张购物中心或主题公园的地图。在几秒钟内,你的大脑就能处理视觉信息,确定自身位置,并规划出到达目的地的最佳路径。你本能地理解哪些线条代表墙壁,哪些代表通道。这种基本的细粒度空间推理能力对人类而言是自然而然的。给定地图上的起点和终点,模型能够输出一条符合地图约束的有效路径。

English Original

Look at a map of a shopping mall or a theme park. Within seconds, your brain processes the visual information, identifies your location, and traces the optimal path to your destination. You instinctively understand which lines are walls and which are walkways. This fundamental skill — fine-grained spatial reasoning — is second nature. Given a start and end location on a map, the model outputs a valid path that respects map constraints.

正文

Look at a map of a shopping mall or a theme park. Within seconds, your brain processes the visual information, identifies your location, and traces the optimal path to your destination. You instinctively understand which lines are walls and which are walkways. This fundamental skill — fine-grained spatial reasoning — is second nature. Given a start and end location on a map, the model outputs a valid path that respects map constraints. We observed that the generated images tend to render text incorrectly however we mostly focus on path qualities in this work. We believe that with improvements in image generation models, these artifacts can be easily suppressed in future work. The most direct way to teach this would be to collect a massive dataset of maps with millions of paths traced by hand. But annotating a single path with pixel-level accuracy is a painstaking process, and scaling it to the level required for training a large model is practically impossible. Furthermore, many of the best examples of complex maps — like those for malls, museums, and theme parks — are proprietary and cannot be easily collected for research. This data bottleneck has held back progress. Without sufficient training examples, models lack the "spatial grammar" to interpret a map correctly. They see a soup of pixels, not a structured, navigable space. To address this data gap, we designed a fully automated, scalable pipeline that leverages the generative capabilities of Gemini Models to produce diverse high-quality maps. This process allows fine-grained control over data diversity and complexity, generating annotated paths that adhere to intended routes and avoid non-traversable regions without the need for collecting large-scale real-world maps. The pipeline works in four automated and scalable stages, using AI models as both creators and critics to ensure quality and produce pixel-level annotations. First, we use a large language model (LLM) to generate rich, descriptive prompts for different types of maps. The LLM generates everything from "a map of a zoo with interconnected habitats" to "a shopping mall with a central food court" or "a fantasy theme park with winding paths through different themed lands." These text prompts are then fed into a text-to-image model that renders them into complex map images. Once we have a map image, we need to identify all the "walkable" areas. Our system does this by clustering the pixels by color to create candidate path masks — essentially, a black-and-white map of all the walkways. With a clean mask of all traversable areas, we convert that 2D image into a more structured graph format. Think of this as creating a digital version of a road network, where intersections are nodes and the roads between them are edges. This "pixel-graph" captures the connectivity of the map, making it easy to calculate routes computationally. This pipeline enabled us to create a dataset of 2M annotated map images with valid paths. While the generated images occasionally exhibit typographic errors, this study focuses primarily on path fidelity. We anticipate that ongoing advancements in generative modeling will naturally mitigate these artifacts in future iterations. Fine-tuning on our dataset substantially improved the models' abilities across the board. The fine-tuned Gemini 2.5 Flash model, for example, saw its NDTW drop significantly (from 1.29 to 0.87), achieving the best overall performance. These gains confirm our central hypothesis: fine-grained spatial reasoning is not an innate property of MLLMs but an acquired skill. With the right kind of explicit supervision, even if it's synthetically generated, we can teach models to understand and navigate spatial layouts. Qualitative examples comparing the fine-tuned Gemini-2.5-Flash (red) to the base model (blue). The fine-tuned model adheres more closely to the intended routes and avoids non-traversable regions. The ability to reason about paths and connectivity unlocks a host of future applications. Including:

资源链接
About Googleabout.googleGoogle Productsabout.google/intl/en/productsMapTraceartemisp.github.io/maptraceMapBencharxiv.org/abs/2503.14607MapTrace: Scalable Data Generation for Route Tracing on Mapsarxiv.org/abs/2512.19609Imagen-4deepmind.google/models/imagenGemini 2.5 Flashdocs.cloud.google.com...i/generative-ai/docs/models/gemini/2-5-flashGemini 2.5 Prodocs.cloud.google.com...-ai/generative-ai/docs/models/gemini/2-5-proDijkstra'sen.wikipedia.org/wiki/Dijkstra%27s_algorithmdynamic time warpingen.wikipedia.org/wiki/Dynamic_time_warpinggraphen.wikipedia.org/wiki/GraphFollow us on githubgithub.com/google-researchHuggingFace Dataset (2M question answer pairshuggingface.co/datasets/google/MapTraceGemma 3 27Bhuggingface.co/google/gemma-3-27b-itPrivacypolicies.google.com/privacyTermspolicies.google.com/termsBlogresearch.google/blogCareersresearch.google/careersLearn more about our Conferences & events Learn moreresearch.google/conferences-and-eventsLearn more about our People Learn moreresearch.google/peopleLearn more about our Philosophy Learn moreresearch.google/philosophyCollaborate with usresearch.google/programs-and-eventsLearn more about our Faculty programs Learn moreresearch.google/programs-and-events/faculty-engagementLearn more about our Student programs Learn moreresearch.google/programs-and-events/student-engagementLearn more about our Publications Learn moreresearch.google/pubsExplore all research areasresearch.google/research-areasAlgorithms & Theoryresearch.google/research-areas/algorithms-and-theoryClimate & Sustainabilityresearch.google/research-areas/climate-and-sustainabilityData Managementresearch.google/research-areas/data-managementData Mining & Modelingresearch.google/research-areas/data-mining-and-modelingDistributed Systems & Parallel Computingresearch.google...s/distributed-systems-and-parallel-computingEconomics & Electronic Commerceresearch.google...arch-areas/economics-and-electronic-commerceEducation Innovationresearch.google/research-areas/education-innovationGeneral Scienceresearch.google/research-areas/general-scienceHardware & Architectureresearch.google/research-areas/hardware-and-architectureHealth & Bioscienceresearch.google/research-areas/health-bioscienceHuman-Computer Interaction and Visualizationresearch.google...human-computer-interaction-and-visualizationInformation Retrieval & the Webresearch.google...arch-areas/information-retrieval-and-the-webMachine Intelligenceresearch.google/research-areas/machine-intelligenceMachine Perceptionresearch.google/research-areas/machine-perceptionMachine Translationresearch.google/research-areas/machine-translationMobile Systemsresearch.google/research-areas/mobile-systemsNatural Language Processingresearch.google/research-areas/natural-language-processingNetworkingresearch.google/research-areas/networkingQuantum Computingresearch.google/research-areas/quantum-computingResponsible AIresearch.google/research-areas/responsible-aiRoboticsresearch.google/research-areas/roboticsSecurity, Privacy, & Abuse Preventionresearch.google...-areas/security-privacy-and-abuse-preventionSoftware Engineeringresearch.google/research-areas/software-engineeringSoftware Systemsresearch.google/research-areas/software-systemsSpeech Processingresearch.google/research-areas/speech-processingLearn more about our Resources Learn moreresearch.google/resourcesLearn more about our Projects Learn moreresearch.google/resources/our-projectsHelpsupport.google.comShare on Twittertwitter.com/intent/tweetShare on Facebookwww.facebook.com/sharer/sharer.phpGooglewww.google.comShare on LinkedInwww.linkedin.com/shareArticleFollow us on linkedinwww.linkedin.com/showcase/googleresearchFollow us on youtubewww.youtube.com/c/GoogleResearchFollow us on xx.com/GoogleResearch原始来源页面research.google/blog/teaching-ai-to-read-a-map
元数据
来源Google Research Blog
类型资讯
抽取状态raw
关键词
Machine Perception
Open Source Models & Datasets
AI
Industry
Dataset